MIBM Performance Optimizations by danieljvickers · Pull Request #1157 · MFlowCode/MFC

danieljvickers · 2026-02-17T16:09:41Z

User description

Description

Following the refactor of the levelset, there were several performance optimizations left to be made to the code. This PR introduces optimizations that will make multi-particle MIBM code viable. It also expands the upper bound of allowed number of immersed boundaries to 1000. Performance was measured on 1-4 ranks of ACC GPU compute using A100 GPUs.

This PR has extended optimization to STL IBs, which are now fully capable of running on the GPU. An example is given below.

Type of change

New feature
Refactor
Other: Performance Tuning

Testing

All changes pass the IBM section of the test suite on GPUs with the NVHPC compiler. Performance was measured with a case of 1000 particles with viscosity enabled. The particles are all resolved 3D spheres given random non-overlapping positions generated by the following case file:

import json
import argparse

num_cells = [240, 240, 240]
dim = [8., 8., 8.]
num_particles = 1000

# create a stationary fluid
case = {
    "run_time_info": "T",
    "parallel_io": "T",
    "m": num_cells[0]-1,
    "n": num_cells[1]-1,
    "p": num_cells[2]-1,
    "dt": 0.005,
    "t_step_start": 0,
    "t_step_stop": 2,
    "t_step_save": 1,
    "num_patches": 1,
    "model_eqns": 2,
    "alt_soundspeed": "F",
    "num_fluids": 1,
    "mpp_lim": "F",
    "mixture_err": "T",
    "time_stepper": 3,
    "recon_type": 1,
    "weno_eps": 1e-16,
    "riemann_solver": 2,
    "wave_speeds": 1,
    "avg_state": 2,
    "precision": 2,
    "format": 1,
    "prim_vars_wrt": "T",
    "E_wrt": "T",
    "viscous": "T",
    "x_domain%beg": -0.5*dim[0],
    "x_domain%end": 0.5*dim[0],
    "y_domain%beg": -0.5*dim[1],
    "y_domain%end": 0.5*dim[1],
    "z_domain%beg": -0.5*dim[2],
    "z_domain%end": 0.5*dim[2],
    "bc_x%beg": -3,
    "bc_x%end": -3,
    "bc_y%beg": -3,
    "bc_y%end": -3,
    "bc_z%beg": -3,
    "bc_z%end": -3,
    "patch_icpp(1)%geometry": 9,
    "patch_icpp(1)%z_centroid": 0.0,
    "patch_icpp(1)%length_z": dim[2],
    "patch_icpp(1)%y_centroid": 0.0,
    "patch_icpp(1)%length_y": dim[1],
    "patch_icpp(1)%x_centroid": 0.0,
    "patch_icpp(1)%length_x": dim[0],
    "weno_order": 5,
    "patch_icpp(1)%pres": 1.0,
    "patch_icpp(1)%alpha_rho(1)": 1.0,
    "patch_icpp(1)%alpha(1)": 1.0,
    "patch_icpp(1)%vel(1)": 0.0,
    "patch_icpp(1)%vel(2)": 0.0,
    "patch_icpp(1)%vel(3)": 0.0,
    "fluid_pp(1)%gamma": 2.5000000000000004,
    "fluid_pp(1)%pi_inf": 0.0,
    "fluid_pp(1)%Re(1)": 2500000,
}

import random
random.seed(42)

dx = [float(dim[i]) / float(num_cells[i]) for i in range(3)]

#set particle properties
particle_radius_cells = 5 # particle radius in grid cells
particle_cell_spacing = particle_radius_cells*2 + 5 # set the spacing to be double the radius plus 5 to garuntee no image points in other IBs
mpi_cell_spacing = particle_radius_cells + 5 # space safely away from MPI halo regions to prevent out of bounds errors
genreation_bounds = [6., 6., 6.] # generate particles in this box safely away from the boundary
velocity_magnitude = 1.

# convert non-dimnesional grid cell units to the units of the simulation
radius_units = float(particle_radius_cells) * dx[0]
paticle_units_spacing = float(particle_cell_spacing) * dx[0]
paticle_units_spacing_squared = paticle_units_spacing**2
mpi_units_spacing = [float(mpi_cell_spacing) * dx[i] for i in range(3)]

# generate an array of xyz values that garuntee non-overlapping grid cells
particles = []
while len(particles) < num_particles:
  # generate a completely random position
  position = [random.random() for i in range(3)]
  position = [(position[i] - 0.5) * genreation_bounds[i] for i in range(3)]

  # first check if the particle is too close to the MPI halo regions
  valid = True
  # for i in range(3):
  #   valid = valid and abs(position[i]) >= mpi_units_spacing[i]

  # check for the minimum spacing between all particles, exiting once we find an error
  for particle in particles:
    distance_squared = sum([(particle[i] - position[i])**2 for i in range(3)])
    valid = valid and distance_squared >= paticle_units_spacing_squared
    if not valid:
      break

  if valid:
    particles.append(position)
    # print(f"\rProgress: {100.*float(len(particles))/float(num_particles)}%", end="", flush=True)

# print()
# convert out array of positions to valid JSON for the immersed boundary
ib_properties = {"ib": "T", "num_ibs": num_particles,}
for i in range(len(particles)):
  ib_properties[f'patch_ib({i+1})%radius'] = radius_units
  ib_properties[f'patch_ib({i+1})%slip'] = 'F'
  ib_properties[f'patch_ib({i+1})%geometry'] = 8
  ib_properties[f'patch_ib({i+1})%moving_ibm'] = 2
  ib_properties[f'patch_ib({i+1})%mass'] = 1.
  ib_properties[f'patch_ib({i+1})%x_centroid'] = particles[i][0]
  ib_properties[f'patch_ib({i+1})%y_centroid'] = particles[i][1]
  ib_properties[f'patch_ib({i+1})%z_centroid'] = particles[i][2]
  # move the particle away radially to garuntee they never touch during the simulation
  position_mag = (sum([(particles[i][j])**2 for j in range(3)]))**0.5
  for j in range(3):
    ib_properties[f'patch_ib({i+1})%vel({j+1})'] = particles[i][j] * velocity_magnitude / position_mag

print(json.dumps({**case, **ib_properties}))

These optimizations add nearly x1000 performance in the moving IBM propagation and generation code. Prior to these optimizations, this was the result of the benchmark case using the NVIDIA NSight profiler showing 45 seconds to run a single RK substep:

Following these optimizations, the same profile achieves almost 50 ms per RK substep:

Checklist

I updated documentation if user-facing behavior changed

See the developer guide for full coding standards.

GPU changes (expand if you modified src/simulation/)

GPU results match CPU results
Tested on NVIDIA GPU or AMD GPU

Summary by CodeRabbit

New Features
- GPU-accelerated handling for STL/model-based levelset and immersed-boundary computations (3D-capable)
Enhancements
- Increased maximum patches and immersed boundaries to 1000
- Improved STL model packing and GPU-friendly data paths for faster distance/normals and inside tests
- More targeted domain bounding to reduce work in patch/IB routines
- Added runtime profiling markers to aid performance analysis

CodeAnt-AI Description

Offload immersed-boundary marker, ghost/image, and STL model work to GPU and allow 1000 IBs

What Changed

Immersed-boundary marker generation for all patch geometries now updates a GPU-aware marker structure and runs with GPU-parallel loops (circle, rectangle, ellipse, sphere, cuboid, cylinder, airfoil, STL).
Ghost-point detection, image-point computation, and interpolation-coefficient computation are performed on the GPU, with reduced work ranges determined by a binary-search bounding-index helper to avoid scanning the whole domain.
STL models are read, transformed, packed into flat arrays, uploaded to the device, and used on-device for "point inside model" and nearest-triangle queries in 2D and 3D levelset calculations.
Case/config tooling and constants updated to permit up to 1000 immersed boundaries; validator and parameter definitions reflect the new limits.
Lightweight profiling ranges (NVTX) added around key IBM stages to aid GPU performance analysis.

Impact

✅ Faster IB setup on GPU
✅ Lower CPU during IB and STL processing
✅ Support for up to 1000 immersed boundaries

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

…mpute

coderabbitai · 2026-02-17T16:09:53Z

📝 Walkthrough

Walkthrough

This PR adds GPU-ready data structures and packing for STL models, migrates levelset and IBM computations to GPU-friendly paths, replaces raw IB marker arrays with a typed interface, expands patch limits to 1000, and adds profiling/instrumentation and parallel macro support for atomic-capture endings.

Changes

Cohort / File(s)	Summary
Parallel macros `src/common/include/parallel_macros.fpp`	Added `END_GPU_ATOMIC_CAPTURE()` macro to emit OpenACC/OpenMP atomic-capture end directives.
Configuration & limits `src/common/m_constants.fpp`, `toolchain/mfc/params/definitions.py`, `toolchain/mfc/case_validator.py`, `toolchain/mfc/params_tests/test_definitions.py`	Increased max patches/IBs from 10 to 1000 and updated related validation and tests (parameter-count expectation adjusted).
Derived types & helpers `src/common/m_derived_types.fpp`, `src/common/m_helper.fpp`	Changed `t_model_array%interpolate` from `logical` to `integer`; added GPU-oriented fields (`ntrs`, `trs_v`, `trs_n`) and annotated `f_cross` with `GPU_ROUTINE`.
Model GPU packing & APIs `src/common/m_model.fpp`	Added GPU-ready public arrays and accessors (`models` type with GPU fields, `gpu_ntrs`, `gpu_trs_v`, `gpu_trs_n`, `gpu_boundary_v`, `gpu_interpolated_boundary_v`, `gpu_interpolate`, etc.), plus `s_pack_model_for_gpu`, flat variants `f_model_is_inside_flat`, `f_distance_normals_3d_flat`.
Levelset computation `src/simulation/m_compute_levelset.fpp`	Switched model-levelset paths to GPU-packed data and flat distance/normals routines; changed local `interpolate` declarations to integer and adjusted 2D/3D GPU-parallel flows accordingly.
IB patch framework `src/simulation/m_ib_patches.fpp`	Replaced `ib_markers_sf` arrays with typed `ib_markers` (`type(integer_field)`), added `get_bounding_indices`, heavily refactored `s_instantiate_STL_models` to pack GPU data and constrain GPU loop ranges.
IBM GPU integration & profiling `src/simulation/m_ibm.fpp`, `src/simulation/m_time_steppers.fpp`	Unified 2D/3D IBM flows for GPU, introduced NVTX ranges, atomic counters for ghost-point aggregation, GPU-parallel regions and device updates; added NVTX start/end in time-step propagation.
Tooling / tests `toolchain/mfc/...`	Adjusted validation bounds and param definitions/tests to reflect increased IB limits and larger parameter registry size.

Sequence Diagram(s)

sequenceDiagram
    participant CPU as CPU/Host
    participant GPU as GPU Device
    participant Model as Model Data
    participant LevelSet as LevelSet
    participant IBM as IBM

    CPU->>Model: s_instantiate_STL_models()
    Note over Model: Read, transform, compute bounding boxes\npack ntrs, trs_v, trs_n, boundary arrays
    CPU->>GPU: s_pack_model_for_gpu() / upload gpu_* arrays
    CPU->>LevelSet: s_compute_levelset()
    LevelSet->>GPU: f_distance_normals_3d_flat() (parallel)
    GPU-->>LevelSet: distances & normals
    CPU->>CPU: s_apply_ib_patches(ib_markers)  -- uses typed markers
    CPU->>IBM: s_update_mib()/s_find_num_ghost_points()
    IBM->>GPU: GPU kernels (ghost/image points, forces) [NVTX ranges]
    GPU-->>IBM: ghost/force data (atomic counters updated)
    IBM->>CPU: aggregated forces & updated markers

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Levelset refactor #1123 — Overlapping levelset and STL model GPU/data-packing changes in m_compute_levelset and m_model.
Add line numbering to gpu loops #1029 — Related parallel macro refactor that introduced start/end directive patterns extended here.
Analytic mibm velocities and airfoil centroid #1111 — Overlapping immersed-boundary/IB patch modifications across m_ib_patches and m_ibm.

Suggested labels

Review effort 4/5, size:XXL

Suggested reviewers

sbryngelson
wilfonba

Poem

🐰 I pack the models, snug and tight,

I send them off to GPU light,
With atoms captured, bounds in sight,
Ghost points counted, forces bright—
Hop, compute, and leap to flight!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'MIBM Performance Optimizations' clearly and concisely summarizes the main focus of the changeset, which involves GPU-based performance improvements to immersed boundary method (MIBM) code.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description check	✅ Passed	The PR description comprehensively addresses all required template sections including motivation, type of change, testing details with benchmark data, and GPU-specific validation confirmations.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…mething about that...

codecov · 2026-02-17T20:49:57Z

Codecov Report

❌ Patch coverage is 57.08812% with 112 lines in your changes missing coverage. Please review.
✅ Project coverage is 44.24%. Comparing base (c5493c1) to head (bc972ca).
⚠️ Report is 4 commits behind head on master.

Files with missing lines	Patch %	Lines
src/simulation/m_ib_patches.fpp	63.12%	56 Missing and 3 partials ⚠️
src/simulation/m_ibm.fpp	50.00%	23 Missing and 13 partials ⚠️
src/common/m_model.fpp	37.03%	17 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1157      +/-   ##
==========================================
+ Coverage   44.07%   44.24%   +0.17%     
==========================================
  Files          70       70              
  Lines       20431    20505      +74     
  Branches     1974     1950      -24     
==========================================
+ Hits         9004     9073      +69     
- Misses      10291    10306      +15     
+ Partials     1136     1126      -10

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…yet)

cubic-dev-ai

7 issues found across 12 files

Confidence score: 2/5

High risk due to concrete breakages: stl_bounding_boxes re-allocation in src/simulation/m_ib_patches.fpp can crash at runtime with multiple STL patches, and src/simulation/m_compute_levelset.fpp changes make if (interpolate) invalid in Fortran 3D path.
An unconditional interpolate = .false. in src/simulation/m_ib_patches.fpp disables STL interpolation, making the intended logic dead and likely altering results.
Pay close attention to src/simulation/m_ib_patches.fpp and src/simulation/m_compute_levelset.fpp - crash/compile risks around interpolation and allocation.

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="toolchain/mfc/case_validator.py">

<violation number="1" location="toolchain/mfc/case_validator.py:687">
P3: Update the validation error message to match the new upper bound (1000) so users get accurate guidance when the check fails.</violation>
</file>

<file name="src/simulation/m_compute_levelset.fpp">

<violation number="1" location="src/simulation/m_compute_levelset.fpp:647">
P1: Changing interpolate to INTEGER breaks the `if (interpolate) then` check in the 3D-model path; INTEGER values are not valid logical expressions in Fortran, so this will not compile. Either keep interpolate LOGICAL (and assign with a logical comparison) or update the condition to compare against 1.</violation>
</file>

<file name="src/simulation/m_time_steppers.fpp">

<violation number="1" location="src/simulation/m_time_steppers.fpp:805">
P3: Fix the NVTX range label typo (“IMERSED”) so profiling markers are consistent and searchable.</violation>
</file>

<file name="src/common/m_model.fpp">

<violation number="1" location="src/common/m_model.fpp:550">
P2: Ray directions should be random around zero; adding point(k) skews them toward the position vector and biases inside/outside tests. Use only the random offset for directions.</violation>
</file>

<file name="src/simulation/m_ib_patches.fpp">

<violation number="1" location="src/simulation/m_ib_patches.fpp:1026">
P1: Debugging override left in: `interpolate = .false.` unconditionally disables STL model interpolation, making the interpolation check dead code. This appears to be a leftover from testing — remove this line to restore the intended interpolation behavior.</violation>

<violation number="2" location="src/simulation/m_ib_patches.fpp:1069">
P1: Runtime crash with multiple STL models: `stl_bounding_boxes` is allocated inside the per-patch loop. When a second STL patch is encountered, Fortran will raise an error because the array is already allocated. Move the allocation before the loop or guard it with a check and use `num_ibs` for the first dimension.</violation>
</file>

<file name="src/simulation/m_ibm.fpp">

<violation number="1" location="src/simulation/m_ibm.fpp:114">
P3: `s_populate_ib_buffers` is now dead code — it is a private subroutine that is no longer called from anywhere after the refactoring removed its callsites in `s_ibm_setup` and `s_update_mib`. Consider removing the subroutine to avoid confusion for future contributors.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-02-19T18:50:33Z

src/simulation/m_compute_levelset.fpp


        integer :: i, j, k, patch_id, boundary_edge_count, total_vertices
-        logical :: interpolate
+        integer :: interpolate


P1: Changing interpolate to INTEGER breaks the if (interpolate) then check in the 3D-model path; INTEGER values are not valid logical expressions in Fortran, so this will not compile. Either keep interpolate LOGICAL (and assign with a logical comparison) or update the condition to compare against 1.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/simulation/m_compute_levelset.fpp, line 647: <comment>Changing interpolate to INTEGER breaks the `if (interpolate) then` check in the 3D-model path; INTEGER values are not valid logical expressions in Fortran, so this will not compile. Either keep interpolate LOGICAL (and assign with a logical comparison) or update the condition to compare against 1.</comment> <file context> @@ -655,7 +644,7 @@ contains integer :: i, j, k, patch_id, boundary_edge_count, total_vertices - logical :: interpolate + integer :: interpolate real(wp), dimension(1:3) :: center, xyz_local real(wp) :: normals(1:3) !< Boundary normal buffer </file context>

src/simulation/m_ib_patches.fpp

cubic-dev-ai · 2026-02-19T18:50:33Z

src/simulation/m_ib_patches.fpp

+                else
+                    call f_check_interpolation_2D(boundary_v, boundary_edge_count, (/dx_local, dy_local, 0._wp/), interpolate)
+                end if
+                interpolate = .false.


P1: Debugging override left in: interpolate = .false. unconditionally disables STL model interpolation, making the interpolation check dead code. This appears to be a leftover from testing — remove this line to restore the intended interpolation behavior.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/simulation/m_ib_patches.fpp, line 1026: <comment>Debugging override left in: `interpolate = .false.` unconditionally disables STL model interpolation, making the interpolation check dead code. This appears to be a leftover from testing — remove this line to restore the intended interpolation behavior.</comment> <file context> @@ -865,83 +806,368 @@ contains + else + call f_check_interpolation_2D(boundary_v, boundary_edge_count, (/dx_local, dy_local, 0._wp/), interpolate) + end if + interpolate = .false. + + ! Show the number of edges and boundary edges in 2D STL models </file context>

cubic-dev-ai · 2026-02-19T18:50:33Z

src/common/m_model.fpp

+                ! random jitter in the origin helps us estimate volume fraction instead of only at the cell center
+                ray_origins(i, k) = point(k) + (f_model_random_number(rand_seed) - 0.5_wp)*spacing(k)
+                ! cast sample rays in all directions
+                ray_dirs(i, k) = point(k) + f_model_random_number(rand_seed) - 0.5_wp


P2: Ray directions should be random around zero; adding point(k) skews them toward the position vector and biases inside/outside tests. Use only the random offset for directions.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/common/m_model.fpp, line 550: <comment>Ray directions should be random around zero; adding point(k) skews them toward the position vector and biases inside/outside tests. Use only the random offset for directions.</comment> <file context> @@ -493,58 +526,116 @@ contains + ! random jitter in the origin helps us estimate volume fraction instead of only at the cell center + ray_origins(i, k) = point(k) + (f_model_random_number(rand_seed) - 0.5_wp)*spacing(k) + ! cast sample rays in all directions + ray_dirs(i, k) = point(k) + f_model_random_number(rand_seed) - 0.5_wp + end do ray_dirs(i, :) = ray_dirs(i, :)/sqrt(sum(ray_dirs(i, :)*ray_dirs(i, :))) </file context>

Suggested change

ray_dirs(i, k) = point(k) + f_model_random_number(rand_seed) - 0.5_wp

ray_dirs(i, k) = f_model_random_number(rand_seed) - 0.5_wp

cubic-dev-ai · 2026-02-19T18:50:33Z

toolchain/mfc/case_validator.py

        self.prohibit(ib and n <= 0,
                     "Immersed Boundaries do not work in 1D (requires n > 0)")
-        self.prohibit(ib and (num_ibs <= 0 or num_ibs > 10),
+        self.prohibit(ib and (num_ibs <= 0 or num_ibs > 1000),


P3: Update the validation error message to match the new upper bound (1000) so users get accurate guidance when the check fails.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At toolchain/mfc/case_validator.py, line 687: <comment>Update the validation error message to match the new upper bound (1000) so users get accurate guidance when the check fails.</comment> <file context> @@ -684,7 +684,7 @@ def check_ibm(self): self.prohibit(ib and n <= 0, "Immersed Boundaries do not work in 1D (requires n > 0)") - self.prohibit(ib and (num_ibs <= 0 or num_ibs > 10), + self.prohibit(ib and (num_ibs <= 0 or num_ibs > 1000), "num_ibs must be between 1 and num_patches_max (10)") self.prohibit(not ib and num_ibs > 0, </file context>

cubic-dev-ai · 2026-02-19T18:50:33Z

src/simulation/m_time_steppers.fpp

        integer :: i
        logical :: forces_computed

+        call nvtxStartRange("PROPAGATE-IMERSED-BOUNDARIES")


P3: Fix the NVTX range label typo (“IMERSED”) so profiling markers are consistent and searchable.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/simulation/m_time_steppers.fpp, line 805: <comment>Fix the NVTX range label typo (“IMERSED”) so profiling markers are consistent and searchable.</comment> <file context> @@ -802,6 +802,8 @@ contains integer :: i logical :: forces_computed + call nvtxStartRange("PROPAGATE-IMERSED-BOUNDARIES") + forces_computed = .false. </file context>

Suggested change

call nvtxStartRange("PROPAGATE-IMERSED-BOUNDARIES")

call nvtxStartRange("PROPAGATE-IMMERSED-BOUNDARIES")

cubic-dev-ai · 2026-02-19T18:50:33Z

src/simulation/m_ibm.fpp

@@ -28,6 +28,8 @@ module m_ibm



P3: s_populate_ib_buffers is now dead code — it is a private subroutine that is no longer called from anywhere after the refactoring removed its callsites in s_ibm_setup and s_update_mib. Consider removing the subroutine to avoid confusion for future contributors.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/simulation/m_ibm.fpp, line 114: <comment>`s_populate_ib_buffers` is now dead code — it is a private subroutine that is no longer called from anywhere after the refactoring removed its callsites in `s_ibm_setup` and `s_update_mib`. Consider removing the subroutine to avoid confusion for future contributors.</comment> <file context> @@ -96,19 +100,18 @@ contains - call s_apply_ib_patches(ib_markers%sf(0:m, 0:n, 0:p)) $:GPU_UPDATE(device='[ib_markers%sf]') - call s_populate_ib_buffers() + call s_apply_ib_patches(ib_markers) $:GPU_UPDATE(host='[ib_markers%sf]') do i = 1, num_ibs </file context>

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (8)

src/simulation/m_time_steppers.fpp (1)
805-855: ⚠️ Potential issue | 🟡 Minor

Fix NVTX range label typo for consistent profiling.

The label reads “IMERSED” instead of “IMMERSED”, which can break grouping in profiling tools.
✏️ Proposed fix
-        call nvtxStartRange("PROPAGATE-IMERSED-BOUNDARIES")
+        call nvtxStartRange("PROPAGATE-IMMERSED-BOUNDARIES")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_time_steppers.fpp` around lines 805 - 855, The NVTX range
label passed to nvtxStartRange in the MTimeSteppers routine is misspelled
("PROPAGATE-IMERSED-BOUNDARIES"); update the string to the correct label
"PROPAGATE-IMMERSED-BOUNDARIES" so profiling tools group this region correctly
(look for the nvtxStartRange(...) call in the do-loop that updates patch_ib and
ensure nvtxEndRange remains paired).
src/simulation/m_compute_levelset.fpp (1)
641-709: ⚠️ Potential issue | 🔴 Critical

Fix integer interpolate used as logical in 3D model path (won't compile).

Line 691 uses if (interpolate) then but interpolate is declared as integer (line 647). This is not valid Fortran and will fail compilation. The 2D branch already correctly uses if (interpolate == 1) then (line 704).
Proposed fix
             ! Get the shortest distance between the cell center and the interpolated model boundary
-            if (interpolate) then
+            if (interpolate == 1) then
                 gp%levelset = f_interpolated_distance(gpu_interpolated_boundary_v(:, :, patch_id), total_vertices, xyz_local)
             else
                 gp%levelset = distance
             end if
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_compute_levelset.fpp` around lines 641 - 709, The integer
variable interpolate in subroutine s_model_levelset is being used as a logical
in the 3D branch (if (interpolate) then) which will not compile; update that
conditional to explicitly compare the integer (e.g., if (interpolate == 1) then)
or change the variable to a logical and adjust all uses accordingly so that the
3D path (the block that currently calls f_interpolated_distance or assigns
gp%levelset = distance) uses a valid boolean check on interpolate.
src/simulation/m_ibm.fpp (3)
589-676: ⚠️ Potential issue | 🟠 Major

Add patch_id to the GPU private list in s_find_ghost_points.

patch_id is assigned in every iteration (lines 628, 666) and used only within that iteration. Without being declared private, multiple GPU threads will access a shared patch_id variable, causing race conditions when reading from patch_ib(patch_id)%....
Proposed fix
-        $:GPU_PARALLEL_LOOP(private='[i,j,k,ii,jj,kk,is_gp,local_idx]', copyin='[count,count_i, x_domain, y_domain, z_domain]', firstprivate='[gp_layers,gp_layers_z]', collapse=3)
+        $:GPU_PARALLEL_LOOP(private='[i,j,k,ii,jj,kk,is_gp,local_idx,patch_id]', copyin='[count,count_i, x_domain, y_domain, z_domain]', firstprivate='[gp_layers,gp_layers_z]', collapse=3)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_ibm.fpp` around lines 589 - 676, The variable patch_id in
s_find_ghost_points is written per-iteration but not listed as private in the
$:GPU_PARALLEL_LOOP, causing concurrent threads to race when reading
patch_ib(patch_id); update the GPU pragma for s_find_ghost_points to include
patch_id in the private list (alongside i,j,k,ii,jj,kk,is_gp,local_idx) so each
thread gets its own patch_id before using patch_ib(patch_id)%... .
441-534: ⚠️ Potential issue | 🟠 Major

Fix pointer association and silent out-of-bounds risk in GPU kernel image-point search.

Inside the GPU-parallel loop, s_cc => x_cc/y_cc/z_cc is pointer association within device code, and the bounds safety check (lines 509–521) is compiled out for GPU builds (#if !defined(MFC_OpenACC)). This means the do while loop can walk index beyond [-buff_size, bound] silently, storing invalid values in ip_grid without error or warning on GPU.

Recommended fixes:

Add explicit bounds clamping inside the loop: if index reaches the boundary, exit and clamp to a valid index.

Alternatively, replace the linear search with a binary search over monotone s_cc for both safety and performance.
Current unsafe loop (lines 506–524)
do while ((temp_loc < s_cc(index) &
           .or. temp_loc > s_cc(index + 1)))
    index = index + dir
`#if` !defined(MFC_OpenACC) && !defined(MFC_OpenMP)
    if (index < -buff_size .or. index > bound) then
        ! error handling (skipped on GPU)
        error stop "Ghost Point and Image Point on Different Processors"
    end if
`#endif`
end do
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_ibm.fpp` around lines 441 - 534, s_compute_image_points
contains a GPU-parallel do-while that advances index using pointer s_cc and can
run out-of-bounds on device builds because the host-only error check is compiled
out; fix by adding an in-device-safe boundary clamp or replace the linear scan
with a binary search over s_cc. Specifically, inside the loop that updates index
(references: s_cc, index, dir, temp_loc, bound, buff_size) add an explicit check
that if index reaches the lower/upper limits (-buff_size or bound) you break the
loop and clamp index to the nearest valid cell before assigning
ghost_points_in(q)%ip_grid(:); alternatively, implement a device-safe
binary_search helper that takes s_cc and temp_loc and returns the correct index
and use that in place of the do-while to guarantee safety and better
performance.
967-1120: ⚠️ Potential issue | 🟠 Major

Add GPU_UPDATE(device=...) after MPI reductions to maintain self-contained GPU synchronization.

The MPI reductions at lines 1108–1109 occur on the host, followed by host-side writes to patch_ib(i)%force and patch_ib(i)%torque. While the caller (s_update_mib at line 940 in m_time_steppers.fpp) does eventually sync patch_ib to device, making s_compute_ib_forces self-contained by adding GPU_UPDATE(device='[patch_ib(1:num_ibs)]') after the final loop ensures coherence is handled within the function and aligns with the guideline: "Wrap MPI calls with GPU_UPDATE to ensure GPU-host synchronization before and after MPI communication."
Suggested fix
         do i = 1, num_ibs
             patch_ib(i)%force(:) = forces(i, :)
             patch_ib(i)%torque(:) = matmul(patch_ib(i)%rotation_matrix_inverse, torques(i, :))
         end do
+
+        ! Sync updated patch_ib to device for GPU kernels
+        $:GPU_UPDATE(device='[patch_ib(1:num_ibs)]')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_ibm.fpp` around lines 967 - 1120, The MPI reductions in
s_compute_ib_forces (calls to s_mpi_allreduce_vectors_sum for forces and
torques) are host-side and must be bracketed by a GPU update to ensure
device/host coherence; insert GPU_UPDATE(device='[patch_ib(1:num_ibs)]')
immediately after the two s_mpi_allreduce_vectors_sum calls and before any
writes to patch_ib(i)%force or patch_ib(i)%torque so the routine handles
synchronization itself.
src/common/m_model.fpp (1)
498-576: ⚠️ Potential issue | 🟠 Major

Ray direction generation is incorrect and risks biased/degenerate rays.

Ray directions in both f_model_is_inside (line 550) and f_model_is_inside_flat (line 608) are computed as point(k) + f_model_random_number(rand_seed) - 0.5_wp. This couples the direction to absolute coordinates, causing heavy bias for large coordinate values. Additionally, both functions normalize without guarding against zero-length vectors, risking division by zero.

Ray directions should be independent of the point's absolute location—only the ray origin should depend on point. The fix removes the point(k) term and adds max(..., sgm_eps) guards on normalization (consistent with existing patterns in the codebase):
Proposed changes
Line 550 (f_model_is_inside):
-                ray_dirs(i, k) = point(k) + f_model_random_number(rand_seed) - 0.5_wp
+                ray_dirs(i, k) = f_model_random_number(rand_seed) - 0.5_wp

Line 552:
-            ray_dirs(i, :) = ray_dirs(i, :)/sqrt(sum(ray_dirs(i, :)*ray_dirs(i, :)))
+            ray_dirs(i, :) = ray_dirs(i, :)/max(sqrt(sum(ray_dirs(i, :)*ray_dirs(i, :))), sgm_eps)

Line 608 (f_model_is_inside_flat):
-                dir(k) = point(k) + f_model_random_number(rand_seed) - 0.5_wp
+                dir(k) = f_model_random_number(rand_seed) - 0.5_wp

Lines 610–611:
-            dir_mag = sqrt(dir(1)*dir(1) + dir(2)*dir(2) + dir(3)*dir(3))
-            dir(:) = dir(:)/dir_mag
+            dir_mag = sqrt(dir(1)*dir(1) + dir(2)*dir(2) + dir(3)*dir(3))
+            dir(:) = dir(:)/max(dir_mag, sgm_eps)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/common/m_model.fpp` around lines 498 - 576, The ray direction generation
in f_model_is_inside (and similarly f_model_is_inside_flat) incorrectly adds
point(k) to the random direction (coupling direction to absolute coordinates)
and normalizes without guarding against zero length; fix by computing
ray_dirs(i,k) = f_model_random_number(rand_seed) - 0.5_wp (no point(k) term) and
when normalizing replace direct division by sqrt(sum(...)) with division by
max(sqrt(sum(ray_dirs(i,:)*ray_dirs(i,:))), sgm_eps_wp) (or the project-wide
small epsilon symbol) to avoid divide-by-zero; update both f_model_is_inside and
f_model_is_inside_flat and keep ray_origins assignment as-is using
f_model_random_number(rand_seed).
src/simulation/m_ib_patches.fpp (2)
169-323: ⚠️ Potential issue | 🟠 Major

Add loop-local scalars xa, yc, and dycdxc to the private list in GPU_PARALLEL_LOOP.

Lines 265–266: These three scalars are assigned per-iteration inside the GPU-parallel region (lines 274–280) but are not declared private, creating a race condition where multiple GPU threads write to the same global variables. Update the private clause to private='[i,j,xy_local,k,f,xa,yc,dycdxc]'.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_ib_patches.fpp` around lines 169 - 323, The
GPU_PARALLEL_LOOP in subroutine s_ib_airfoil currently omits loop-local scalars
xa, yc, and dycdxc from its private list, causing a race condition; update the
GPU_PARALLEL_LOOP directive's private clause to include xa, yc, and dycdxc (so
it becomes private='[i,j,xy_local,k,f,xa,yc,dycdxc]') so each GPU thread gets
its own copies when evaluating the airfoil logic that assigns xa, yc, and dycdxc
and writes into ib_markers%sf.
28-112: ⚠️ Potential issue | 🟠 Major

Public API change: s_apply_ib_patches now takes type(integer_field) — all callers verified and updated correctly.

s_apply_ib_patches changed its argument type (line 60). Both call sites in src/simulation/m_ibm.fpp (lines 114, 945) correctly pass type(integer_field) as argument. This breaking signature change affects the public API and is appropriately flagged for review.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_ib_patches.fpp` around lines 28 - 112, s_apply_ib_patches
now expects an argument of type(integer_field) (the signature changed), so
update all call sites and any interface/wrapper that referenced the old
signature; specifically ensure the calls in m_ibm.fpp (the two places that were
already updated) match the new prototype and remove/adjust any legacy
declarations or interface blocks that declared s_apply_ib_patches with the old
argument type, and update any documentation/comments that describe
s_apply_ib_patches to reflect it takes a type(integer_field) argument.

🧹 Nitpick comments (2)

src/common/m_constants.fpp (1)
25-25: Run all three targets after updating shared constants.

This change is in src/common/, so please ensure pre_process, simulation, and post_process targets are exercised to catch any cross-target regressions.
Based on learnings: “Changes to src/common/ affect all three executables (pre_process, simulation, post_process); test all three targets.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/common/m_constants.fpp` at line 25, You changed the shared constant
num_patches_max in src/common/m_constants.fpp which can impact all executables;
rebuild and run the pre_process, simulation, and post_process targets (or their
test suites) to exercise cross-target behavior and catch regressions —
specifically rebuild the three binaries that consume num_patches_max
(pre_process, simulation, post_process), run their full test inputs or
integration runs, and verify no runtime assertions, array bounds, or config
mismatches occur after the change.
src/simulation/m_ibm.fpp (1)
536-586: Use reduction instead of atomic operations to improve scaling for grid-cell classification.

num_gps_local and num_inner_gps_local are currently incremented via GPU_ATOMIC(update) for every classified cell, causing high contention on large grids. Since these are simple sums, use GPU_PARALLEL_LOOP reductions instead:
$:GPU_PARALLEL_LOOP(collapse=3, private='[i,j,k,ii,jj,kk,is_gp]', firstprivate='[gp_layers,gp_layers_z]', &
    & reduction='[[num_gps_local, num_inner_gps_local]]', reductionOp='[+, +]')
Replace the copy directive and remove both GPU_ATOMIC calls. Gang/vector reductions avoid per-thread atomics and scale significantly better.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_ibm.fpp` around lines 536 - 586, The loop in subroutine
s_find_num_ghost_points uses per-cell GPU_ATOMIC updates on num_gps_local and
num_inner_gps_local causing contention; replace the copy-based GPU_PARALLEL_LOOP
with a reduction-based directive so the counts are accumulated via GPU
reductions instead of atomics (use GPU_PARALLEL_LOOP with collapse=3,
private='[i,j,k,ii,jj,kk,is_gp]', firstprivate='[gp_layers,gp_layers_z]' and a
reduction clause for [num_gps_local, num_inner_gps_local] with '+'), remove the
two GPU_ATOMIC(atomic='update') regions and keep the rest of the
ghost-classification logic unchanged, then assign num_gps_out and
num_inner_gps_out from the reduced locals as before.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/common/m_derived_types.fpp`:
- Around line 186-199: The t_model_array derived type declares scalar fields
boundary_edge_count, total_vertices, interpolate and ntrs without defaults which
can leave garbage values; add default initializers for these scalars in the type
definition (set boundary_edge_count=0, total_vertices=0, interpolate=0, ntrs=0)
so all instances start deterministic before any assignment, and then run and
verify pre_process, post_process and simulation builds/tests to ensure the
initialization change doesn't break their workflows; update any constructors or
code that relied on uninitialized behavior if necessary.

In `@src/common/m_model.fpp`:
- Around line 1209-1267: The subroutine f_distance_normals_3D_flat can index
trs_n(:, tri_idx, pid) with tri_idx==0 when ntrs <= 0; add a guard at the top of
f_distance_normals_3D_flat that checks if ntrs <= 0 and in that case set
distance to a safe value (e.g. a large positive sentinel or 0._wp as
appropriate) and set normals to a defined default (e.g. zero vector) and return
immediately to avoid using tri_idx/uninitialized indices; ensure you reference
and update tri_idx, distance and normals in this early-return path so no
out-of-bounds access to trs_n occurs.
- Around line 20-42: Add a module finalizer in m_model.fpp (e.g., subroutine
s_finalize_model or a MODULE PROCEDURE bound to final) that checks
ALLOCATED(...) and calls DEALLOCATE(...) for all public allocatable arrays
gpu_ntrs, gpu_trs_v, gpu_trs_n, gpu_boundary_v, gpu_interpolated_boundary_v,
gpu_interpolate, gpu_boundary_edge_count, gpu_total_vertices, stl_bounding_boxes
and models to ensure GPU/host memory is freed during simulation cleanup; export
or document the finalizer so the cleanup path in your driver or cleanup routine
calls it. Also remove the unnecessary "use m_model" import from
src/pre_process/m_icpp_patches.fpp (or the specific pre_process file mentioned)
since it does not reference the gpu_* arrays or models, eliminating the unwanted
coupling.
- Around line 1402-1415: s_pack_model_for_gpu currently unconditionally
allocates ma%trs_v and ma%trs_n; make it re-entrant by checking existing
allocations before allocating: set ma%ntrs = ma%model%ntrs, then for ma%trs_v
and ma%trs_n use allocated(...) to detect prior allocation and only allocate if
not allocated or if the current allocation dimensions don't match ma%ntrs (in
that case deallocate then allocate with the correct bounds); ensure you copy
into ma%trs_v(:, :, i) and ma%trs_n(:, i) after the allocation checks complete.

In `@src/simulation/m_ib_patches.fpp`:
- Around line 954-1169: s_instantiate_STL_models has four fixes: allocate
stl_bounding_boxes once before the patch loop with shape (num_ibs,1:3,1:3)
instead of calling allocate(stl_bounding_boxes(patch_id,...)) inside the loop;
ensure total_vertices is initialized or only copied into
models(patch_id)%total_vertices when interpolate is true (do not propagate
uninitialized total_vertices when interpolation is skipped); pass spacing
magnitudes to the interpolation routines by using (/dx_local, dy_local,
dz_local/) when calling f_interpolate_3D/f_interpolate_2D (and remove or fix the
unconditional interpolate = .false. that currently masks interpolation logic);
and protect module-level gpu_* allocations (gpu_ntrs, gpu_trs_v, gpu_trs_n,
gpu_interpolate, gpu_boundary_edge_count, gpu_total_vertices, gpu_boundary_v,
gpu_interpolated_boundary_v) with if (.not. allocated(...)) checks before each
allocate to allow repeated calls to s_instantiate_STL_models.
- Around line 828-953: s_update_mib’s restart path must refresh the
device-resident model arrays before generating markers: ensure
s_instantiate_STL_models() (or an equivalent GPU_ENTER_DATA/refresh routine for
module arrays gpu_ntrs, gpu_trs_v, gpu_trs_n) is called or re-entered before
s_apply_ib_patches() in the restart branch so device data reflects any changes
from s_update_ib_rotation_matrix(); in s_ib_model and s_ib_3d_model, add
explicit device presence to the GPU_PARALLEL_LOOP clauses by including
present='[gpu_ntrs, gpu_trs_v, gpu_trs_n]' (in addition to existing copyin) to
avoid relying on implicit persistence; and narrow the 2D scan in s_ib_model by
reusing the bounding-index logic from s_ib_3d_model (compute corner_distance
from stl_bounding_boxes and call get_bounding_indices to set i/j limits) to
avoid scanning the full grid for sparse STL patches.

In `@toolchain/mfc/case_validator.py`:
- Around line 687-688: The error message passed to self.prohibit for the num_ibs
check is stale (mentions 10) while the actual upper bound is 1000; update the
message to reflect the real max and, to prevent future drift, construct the
message using the authoritative constraint value (e.g. the CONSTRAINTS constant
or the class attribute used for bounds) rather than a hardcoded literal—locate
the self.prohibit call that checks num_ibs (the line with self.prohibit(ib and
(num_ibs <= 0 or num_ibs > 1000), ...)) and interpolate/format the upper-bound
from CONSTRAINTS (or self.CONSTRAINTS / CONSTRAINTS['num_ibs']['max'] as
appropriate) so the message reads "num_ibs must be between 1 and <max>".

---

Outside diff comments:
In `@src/common/m_model.fpp`:
- Around line 498-576: The ray direction generation in f_model_is_inside (and
similarly f_model_is_inside_flat) incorrectly adds point(k) to the random
direction (coupling direction to absolute coordinates) and normalizes without
guarding against zero length; fix by computing ray_dirs(i,k) =
f_model_random_number(rand_seed) - 0.5_wp (no point(k) term) and when
normalizing replace direct division by sqrt(sum(...)) with division by
max(sqrt(sum(ray_dirs(i,:)*ray_dirs(i,:))), sgm_eps_wp) (or the project-wide
small epsilon symbol) to avoid divide-by-zero; update both f_model_is_inside and
f_model_is_inside_flat and keep ray_origins assignment as-is using
f_model_random_number(rand_seed).

In `@src/simulation/m_compute_levelset.fpp`:
- Around line 641-709: The integer variable interpolate in subroutine
s_model_levelset is being used as a logical in the 3D branch (if (interpolate)
then) which will not compile; update that conditional to explicitly compare the
integer (e.g., if (interpolate == 1) then) or change the variable to a logical
and adjust all uses accordingly so that the 3D path (the block that currently
calls f_interpolated_distance or assigns gp%levelset = distance) uses a valid
boolean check on interpolate.

In `@src/simulation/m_ib_patches.fpp`:
- Around line 169-323: The GPU_PARALLEL_LOOP in subroutine s_ib_airfoil
currently omits loop-local scalars xa, yc, and dycdxc from its private list,
causing a race condition; update the GPU_PARALLEL_LOOP directive's private
clause to include xa, yc, and dycdxc (so it becomes
private='[i,j,xy_local,k,f,xa,yc,dycdxc]') so each GPU thread gets its own
copies when evaluating the airfoil logic that assigns xa, yc, and dycdxc and
writes into ib_markers%sf.
- Around line 28-112: s_apply_ib_patches now expects an argument of
type(integer_field) (the signature changed), so update all call sites and any
interface/wrapper that referenced the old signature; specifically ensure the
calls in m_ibm.fpp (the two places that were already updated) match the new
prototype and remove/adjust any legacy declarations or interface blocks that
declared s_apply_ib_patches with the old argument type, and update any
documentation/comments that describe s_apply_ib_patches to reflect it takes a
type(integer_field) argument.

In `@src/simulation/m_ibm.fpp`:
- Around line 589-676: The variable patch_id in s_find_ghost_points is written
per-iteration but not listed as private in the $:GPU_PARALLEL_LOOP, causing
concurrent threads to race when reading patch_ib(patch_id); update the GPU
pragma for s_find_ghost_points to include patch_id in the private list
(alongside i,j,k,ii,jj,kk,is_gp,local_idx) so each thread gets its own patch_id
before using patch_ib(patch_id)%... .
- Around line 441-534: s_compute_image_points contains a GPU-parallel do-while
that advances index using pointer s_cc and can run out-of-bounds on device
builds because the host-only error check is compiled out; fix by adding an
in-device-safe boundary clamp or replace the linear scan with a binary search
over s_cc. Specifically, inside the loop that updates index (references: s_cc,
index, dir, temp_loc, bound, buff_size) add an explicit check that if index
reaches the lower/upper limits (-buff_size or bound) you break the loop and
clamp index to the nearest valid cell before assigning
ghost_points_in(q)%ip_grid(:); alternatively, implement a device-safe
binary_search helper that takes s_cc and temp_loc and returns the correct index
and use that in place of the do-while to guarantee safety and better
performance.
- Around line 967-1120: The MPI reductions in s_compute_ib_forces (calls to
s_mpi_allreduce_vectors_sum for forces and torques) are host-side and must be
bracketed by a GPU update to ensure device/host coherence; insert
GPU_UPDATE(device='[patch_ib(1:num_ibs)]') immediately after the two
s_mpi_allreduce_vectors_sum calls and before any writes to patch_ib(i)%force or
patch_ib(i)%torque so the routine handles synchronization itself.

In `@src/simulation/m_time_steppers.fpp`:
- Around line 805-855: The NVTX range label passed to nvtxStartRange in the
MTimeSteppers routine is misspelled ("PROPAGATE-IMERSED-BOUNDARIES"); update the
string to the correct label "PROPAGATE-IMMERSED-BOUNDARIES" so profiling tools
group this region correctly (look for the nvtxStartRange(...) call in the
do-loop that updates patch_ib and ensure nvtxEndRange remains paired).

---

Nitpick comments:
In `@src/common/m_constants.fpp`:
- Line 25: You changed the shared constant num_patches_max in
src/common/m_constants.fpp which can impact all executables; rebuild and run the
pre_process, simulation, and post_process targets (or their test suites) to
exercise cross-target behavior and catch regressions — specifically rebuild the
three binaries that consume num_patches_max (pre_process, simulation,
post_process), run their full test inputs or integration runs, and verify no
runtime assertions, array bounds, or config mismatches occur after the change.

In `@src/simulation/m_ibm.fpp`:
- Around line 536-586: The loop in subroutine s_find_num_ghost_points uses
per-cell GPU_ATOMIC updates on num_gps_local and num_inner_gps_local causing
contention; replace the copy-based GPU_PARALLEL_LOOP with a reduction-based
directive so the counts are accumulated via GPU reductions instead of atomics
(use GPU_PARALLEL_LOOP with collapse=3, private='[i,j,k,ii,jj,kk,is_gp]',
firstprivate='[gp_layers,gp_layers_z]' and a reduction clause for
[num_gps_local, num_inner_gps_local] with '+'), remove the two
GPU_ATOMIC(atomic='update') regions and keep the rest of the
ghost-classification logic unchanged, then assign num_gps_out and
num_inner_gps_out from the reduced locals as before.

coderabbitai · 2026-02-19T19:08:53Z

src/common/m_derived_types.fpp

    type :: t_model_array
+        ! Original CPU-side fields (unchanged)
        type(t_model), allocatable :: model
        real(wp), allocatable, dimension(:, :, :) :: boundary_v
        real(wp), allocatable, dimension(:, :) :: interpolated_boundary_v
        integer :: boundary_edge_count
        integer :: total_vertices
-        logical :: interpolate
+        integer :: interpolate
+
+        ! GPU-friendly flattened arrays
+        integer :: ntrs  ! copy of model%ntrs
+        real(wp), allocatable, dimension(:, :, :) :: trs_v  ! (3, 3, ntrs) - triangle vertices
+        real(wp), allocatable, dimension(:, :) :: trs_n  ! (3, ntrs)    - triangle normals
    end type t_model_array


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, check the current state of the file at the specified lines cat -n src/common/m_derived_types.fpp | sed -n '180,210p'

Repository: MFlowCode/MFC

Length of output: 1607

🏁 Script executed:

# Search for all uses of %interpolate across the codebase rg -nP '%interpolate\b' src --type-add 'fpp:*.fpp' --type 'fpp'

Repository: MFlowCode/MFC

Length of output: 344

🏁 Script executed:

# Look for suspicious logical patterns with %interpolate rg -nP '%interpolate\b.*(\.(true|false)|\.not\.)' src --type-add 'fpp:*.fpp' --type 'fpp'

Repository: MFlowCode/MFC

Length of output: 39

🏁 Script executed:

# Check if there are any conditional patterns that may need review rg -nP 'if\s*\(.*%interpolate' src --type-add 'fpp:*.fpp' --type 'fpp' -A 2

Repository: MFlowCode/MFC

Length of output: 854

🏁 Script executed:

# Check for assignments to %interpolate in the codebase rg -nP '%interpolate\s*=' src --type-add 'fpp:*.fpp' --type 'fpp' -B 1 -A 1

Repository: MFlowCode/MFC

Length of output: 429

t_model_array scalars must be initialized to prevent garbage values reaching GPU or uninitialized field reads.

All uses of %interpolate correctly employ integer semantics (0/1, no logical operators); however, the type definition still lacks default initialization for boundary_edge_count, total_vertices, interpolate, and ntrs. Uninitialized scalars can leak garbage into GPU staging arrays or cause undefined behavior if any code path reads these fields before assignment.

Apply default initialization

integer :: boundary_edge_count = 0 integer :: total_vertices = 0 integer :: interpolate = 0 integer :: ntrs = 0 ! copy of model%ntrs

Additionally, verify that changes do not break pre_process and post_process executables, not just simulation.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

type :: t_model_array

! Original CPU-side fields (unchanged)

type(t_model), allocatable :: model

real(wp), allocatable, dimension(:, :, :) :: boundary_v

real(wp), allocatable, dimension(:, :) :: interpolated_boundary_v

integer :: boundary_edge_count

integer :: total_vertices

logical :: interpolate

integer :: interpolate

! GPU-friendly flattened arrays

integer :: ntrs ! copy of model%ntrs

real(wp), allocatable, dimension(:, :, :) :: trs_v ! (3, 3, ntrs) - triangle vertices

real(wp), allocatable, dimension(:, :) :: trs_n ! (3, ntrs) - triangle normals

end type t_model_array

type :: t_model_array

! Original CPU-side fields (unchanged)

type(t_model), allocatable :: model

real(wp), allocatable, dimension(:, :, :) :: boundary_v

real(wp), allocatable, dimension(:, :) :: interpolated_boundary_v

integer :: boundary_edge_count = 0

integer :: total_vertices = 0

integer :: interpolate = 0

! GPU-friendly flattened arrays

integer :: ntrs = 0 ! copy of model%ntrs

real(wp), allocatable, dimension(:, :, :) :: trs_v ! (3, 3, ntrs) - triangle vertices

real(wp), allocatable, dimension(:, :) :: trs_n ! (3, ntrs) - triangle normals

end type t_model_array

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/common/m_derived_types.fpp` around lines 186 - 199, The t_model_array derived type declares scalar fields boundary_edge_count, total_vertices, interpolate and ntrs without defaults which can leave garbage values; add default initializers for these scalars in the type definition (set boundary_edge_count=0, total_vertices=0, interpolate=0, ntrs=0) so all instances start deterministic before any assignment, and then run and verify pre_process, post_process and simulation builds/tests to ensure the initialization change doesn't break their workflows; update any constructors or code that relied on uninitialized behavior if necessary.

coderabbitai · 2026-02-19T19:08:53Z

src/common/m_model.fpp

+    public :: f_model_read, s_model_write, s_model_free, f_model_is_inside, models, gpu_ntrs, &
+              gpu_trs_v, gpu_trs_n, gpu_boundary_v, gpu_interpolated_boundary_v, gpu_interpolate, gpu_boundary_edge_count, &
+              gpu_total_vertices, stl_bounding_boxes

    ! Subroutines for STL immersed boundaries
    public :: f_check_boundary, f_register_edge, f_check_interpolation_2D, &
              f_check_interpolation_3D, f_interpolate_2D, f_interpolate_3D, &
-              f_interpolated_distance, f_normals, f_distance, f_distance_normals_3D, f_tri_area
+              f_interpolated_distance, f_normals, f_distance, f_distance_normals_3D, f_tri_area, s_pack_model_for_gpu, &
+              f_model_is_inside_flat, f_distance_normals_3d_flat
+
+    !! array of STL models that can be allocated and then used in IB marker and levelset compute
+    type(t_model_array), allocatable, target :: models(:)
+    !! GPU-friendly flat arrays for STL model data
+    integer, allocatable :: gpu_ntrs(:)
+    real(wp), allocatable, dimension(:, :, :, :) :: gpu_trs_v
+    real(wp), allocatable, dimension(:, :, :) :: gpu_trs_n
+    real(wp), allocatable, dimension(:, :, :, :) :: gpu_boundary_v
+    real(wp), allocatable, dimension(:, :, :) :: gpu_interpolated_boundary_v
+    integer, allocatable :: gpu_interpolate(:)
+    integer, allocatable :: gpu_boundary_edge_count(:)
+    integer, allocatable :: gpu_total_vertices(:)
+    real(wp), allocatable :: stl_bounding_boxes(:, :, :)



⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find allocation and deallocation sites for the new module variables echo "=== Allocation/Deallocation patterns ===" rg -nP --type fpp '(allocate|deallocate)\s*\(' src/common/m_model.fpp | head -30 echo -e "\n=== GPU array allocations in entire src/common ===" rg -nP --type fpp 'gpu_ntrs|gpu_trs_v|gpu_trs_n|gpu_boundary_v|gpu_interpolated_boundary_v|gpu_interpolate|gpu_boundary_edge_count|gpu_total_vertices|stl_bounding_boxes' src/common/ | grep -i allocate | head -20 echo -e "\n=== Where m_model is imported ===" rg -n 'use.*m_model' src/ | head -20

Repository: MFlowCode/MFC

Length of output: 622

🏁 Script executed:

#!/bin/bash # Check the finalization routine referenced in the public interface echo "=== s_model_free subroutine definition ===" rg -A 50 'subroutine\s+s_model_free' src/common/m_model.fpp | head -60 echo -e "\n=== s_model_write subroutine definition ===" rg -A 30 'subroutine\s+s_model_write' src/common/m_model.fpp | head -40

Repository: MFlowCode/MFC

Length of output: 2632

🏁 Script executed:

#!/bin/bash # Check if pre_process or post_process reference the new public variables or m_model echo "=== pre_process usage of m_model ===" rg -n 'gpu_ntrs|gpu_trs_v|gpu_trs_n|gpu_boundary_v|gpu_interpolated_boundary_v|stl_bounding_boxes|models' src/pre_process/ 2>/dev/null | head -10 echo -e "\n=== post_process usage of m_model ===" rg -n 'gpu_ntrs|gpu_trs_v|gpu_trs_n|gpu_boundary_v|gpu_interpolated_boundary_v|stl_bounding_boxes|models' src/post_process/ 2>/dev/null | head -10 echo -e "\n=== Check if pre_process or post_process import m_model ===" rg -n 'use.*m_model' src/pre_process/ src/post_process/ 2>/dev/null

Repository: MFlowCode/MFC

Length of output: 518

🏁 Script executed:

#!/bin/bash # Get the full m_model.fpp file to understand context wc -l src/common/m_model.fpp

Repository: MFlowCode/MFC

Length of output: 84

🏁 Script executed:

#!/bin/bash # Read the module to find allocation patterns # Focus on where the public arrays are allocated sed -n '20,100p' src/common/m_model.fpp | cat -n

Repository: MFlowCode/MFC

Length of output: 3661

🏁 Script executed:

#!/bin/bash # Search for allocation patterns - try broader search rg -n 'allocate' src/common/m_model.fpp | head -30

Repository: MFlowCode/MFC

Length of output: 786

🏁 Script executed:

#!/bin/bash # Look for initialization subroutines that might allocate these arrays rg -n 'subroutine.*init\|subroutine.*allocate\|subroutine.*setup' src/common/m_model.fpp | head -20

Repository: MFlowCode/MFC

Length of output: 39

🏁 Script executed:

#!/bin/bash # Search for where models(:) is allocated rg -B 3 -A 3 'models\s*\(' src/common/m_model.fpp | head -50

Repository: MFlowCode/MFC

Length of output: 430

🏁 Script executed:

#!/bin/bash # Look for gpu_ntrs allocation pattern rg -B 3 -A 3 'gpu_ntrs\s*\(' src/common/m_model.fpp | head -50

Repository: MFlowCode/MFC

Length of output: 490

🏁 Script executed:

#!/bin/bash # Check how these are used in simulation to understand lifecycle echo "=== Usage in m_ibm.fpp ===" rg -B 2 -A 2 'gpu_ntrs\|gpu_trs_v\|gpu_boundary_v\|models\(' src/simulation/m_ibm.fpp | head -40 echo -e "\n=== Usage in pre_process/m_icpp_patches.fpp ===" rg -B 2 -A 2 'models\(' src/pre_process/m_icpp_patches.fpp | head -40

Repository: MFlowCode/MFC

Length of output: 132

🏁 Script executed:

#!/bin/bash # Search for allocation of the MODULE-LEVEL arrays in calling modules echo "=== Search for allocate of gpu_ntrs in entire codebase ===" rg 'allocate.*gpu_ntrs\|gpu_ntrs.*allocate' src/ | head -20 echo -e "\n=== Search for s_pack_model_for_gpu ===" rg -n 's_pack_model_for_gpu' src/ | head -20 echo -e "\n=== Look for where gpu arrays are assigned/populated ===" rg -n 'gpu_ntrs\s*=' src/ | head -20

Repository: MFlowCode/MFC

Length of output: 591

🏁 Script executed:

#!/bin/bash # Search for deallocate of module-level arrays echo "=== Deallocate of module-level gpu arrays ===" rg 'deallocate.*gpu_ntrs\|deallocate.*gpu_trs_v\|deallocate.*models\(' src/ | head -20 echo -e "\n=== Look for finalization in m_ibm.fpp ===" rg -n 'finalize\|cleanup\|deallocate' src/simulation/m_ibm.fpp | head -20

Repository: MFlowCode/MFC

Length of output: 146

🏁 Script executed:

#!/bin/bash # Look at s_pack_model_for_gpu implementation echo "=== s_pack_model_for_gpu subroutine ===" rg -A 80 'subroutine\s+s_pack_model_for_gpu' src/common/m_model.fpp | head -100

Repository: MFlowCode/MFC

Length of output: 519

🏁 Script executed:

#!/bin/bash # Final verification: confirm module-level arrays have no lifecycle echo "=== Complete scan for any allocation pattern ===" rg 'allocate\s*\(' src/common/m_model.fpp src/simulation/m_ib_patches.fpp src/simulation/m_ibm.fpp | grep -E 'gpu_ntrs|gpu_trs_v|gpu_boundary_v|models\s*\(' || echo "NO ALLOCATIONS FOUND for public module arrays" echo -e "\n=== Confirm these are truly module-level (public scope) ===" rg -B 1 'integer.*allocatable.*gpu_ntrs|real.*allocatable.*gpu_trs' src/common/m_model.fpp

Repository: MFlowCode/MFC

Length of output: 763

🏁 Script executed:

#!/bin/bash # Search for deallocation of these specific arrays echo "=== Deallocate patterns for public module arrays ===" rg -n 'deallocate.*gpu_ntrs\|deallocate.*gpu_trs_v\|deallocate.*gpu_boundary_v\|deallocate.*models' src/ | head -20 echo -e "\n=== Broader search for deallocate in m_ib_patches.fpp ===" rg -n 'deallocate' src/simulation/m_ib_patches.fpp | head -20 echo -e "\n=== Context around gpu array allocations in m_ib_patches.fpp ===" rg -B 5 -A 10 'allocate.*gpu_ntrs' src/simulation/m_ib_patches.fpp | head -40

Repository: MFlowCode/MFC

Length of output: 890

🏁 Script executed:

#!/bin/bash # Check what pre_process actually uses from m_model echo "=== What does pre_process use from m_model? ===" sed -n '1,50p' src/pre_process/m_icpp_patches.fpp | grep -A 15 'use m_model' echo -e "\n=== Any direct usage of gpu_* or models arrays in pre_process? ===" rg -n 'gpu_ntrs|gpu_trs_v|gpu_boundary_v|models\(' src/pre_process/m_icpp_patches.fpp | head -10

Repository: MFlowCode/MFC

Length of output: 553

Add deallocate() calls for all public module arrays in a finalization routine, and remove unnecessary m_model import from pre_process.

The public module arrays (gpu_ntrs, gpu_trs_v, gpu_trs_n, gpu_interpolate, gpu_boundary_edge_count, gpu_total_vertices) are allocated in m_ib_patches.fpp but never deallocated, causing GPU memory leaks. Create a finalization subroutine in m_model.fpp to deallocate these arrays and call it during simulation cleanup. Additionally, src/pre_process/m_icpp_patches.fpp imports m_model but does not use any of the gpu or models arrays—remove this unnecessary coupling per src/common/ guidelines.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/common/m_model.fpp` around lines 20 - 42, Add a module finalizer in m_model.fpp (e.g., subroutine s_finalize_model or a MODULE PROCEDURE bound to final) that checks ALLOCATED(...) and calls DEALLOCATE(...) for all public allocatable arrays gpu_ntrs, gpu_trs_v, gpu_trs_n, gpu_boundary_v, gpu_interpolated_boundary_v, gpu_interpolate, gpu_boundary_edge_count, gpu_total_vertices, stl_bounding_boxes and models to ensure GPU/host memory is freed during simulation cleanup; export or document the finalizer so the cleanup path in your driver or cleanup routine calls it. Also remove the unnecessary "use m_model" import from src/pre_process/m_icpp_patches.fpp (or the specific pre_process file mentioned) since it does not reference the gpu_* arrays or models, eliminating the unwanted coupling.

coderabbitai · 2026-02-19T19:08:53Z

src/common/m_model.fpp

+    subroutine f_distance_normals_3D_flat(ntrs, trs_v, trs_n, pid, point, normals, distance)
+
+        $:GPU_ROUTINE(parallelism='[seq]')
+
+        integer, intent(in) :: ntrs
+        real(wp), dimension(:, :, :, :), intent(in) :: trs_v
+        real(wp), dimension(:, :, :), intent(in) :: trs_n
+        integer, intent(in) :: pid
+        real(wp), dimension(1:3), intent(in) :: point
+        real(wp), dimension(1:3), intent(out) :: normals
+        real(wp), intent(out) :: distance
+
+        real(wp), dimension(1:3, 1:3) :: tri
+        real(wp) :: dist_min, dist_t_min
+        real(wp) :: dist_min_normal, dist_buffer_normal
+        real(wp), dimension(1:3) :: midp
+        real(wp), dimension(1:3) :: dist_buffer
+        integer :: i, j, tri_idx
+
+        dist_min = 1.e12_wp
+        dist_min_normal = 1.e12_wp
+        distance = 0._wp
+
+        tri_idx = 0
+        do i = 1, ntrs
+            do j = 1, 3
+                tri(j, 1) = trs_v(j, 1, i, pid)
+                tri(j, 2) = trs_v(j, 2, i, pid)
+                tri(j, 3) = trs_v(j, 3, i, pid)
+                dist_buffer(j) = sqrt((point(1) - tri(j, 1))**2 + &
+                                      (point(2) - tri(j, 2))**2 + &
+                                      (point(3) - tri(j, 3))**2)
+            end do
+
+            do j = 1, 3
+                midp(j) = (tri(1, j) + tri(2, j) + tri(3, j))/3
+            end do
+
+            dist_t_min = minval(dist_buffer(1:3))
+            dist_buffer_normal = sqrt((point(1) - midp(1))**2 + &
+                                      (point(2) - midp(2))**2 + &
+                                      (point(3) - midp(3))**2)
+
+            if (dist_t_min < dist_min) then
+                dist_min = dist_t_min
+            end if
+
+            if (dist_buffer_normal < dist_min_normal) then
+                dist_min_normal = dist_buffer_normal
+                tri_idx = i
+            end if
+        end do
+
+        normals(1) = trs_n(1, tri_idx, pid)
+        normals(2) = trs_n(2, tri_idx, pid)
+        normals(3) = trs_n(3, tri_idx, pid)
+        distance = dist_min
+
+    end subroutine f_distance_normals_3D_flat


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

rg -nP --type-add 'fpp:*.fpp' --type 'fpp' '\bf_distance_normals_3D_flat\s*\(' src

Repository: MFlowCode/MFC

Length of output: 345

🏁 Script executed:

sed -n '680,700p' src/simulation/m_compute_levelset.fpp

Repository: MFlowCode/MFC

Length of output: 913

🏁 Script executed:

rg -nP 'gpu_ntrs' src/simulation/m_compute_levelset.fpp | head -20

Repository: MFlowCode/MFC

Length of output: 186

🏁 Script executed:

rg -nP '\bgpu_ntrs\b' src --type-add 'fpp:*.fpp' | grep -E '(allocate|dimension|intent|gpu_ntrs\s*=)'

Repository: MFlowCode/MFC

Length of output: 206

🏁 Script executed:

sed -n '1110,1160p' src/simulation/m_ib_patches.fpp

Repository: MFlowCode/MFC

Length of output: 2729

🏁 Script executed:

sed -n '650,710p' src/simulation/m_compute_levelset.fpp

Repository: MFlowCode/MFC

Length of output: 2630

Add guard in f_distance_normals_3D_flat for ntrs <= 0 to prevent out-of-bounds access.

When ntrs is 0 (which can occur if a patch has no model allocated), tri_idx remains uninitialized at 0 and is used to index trs_n(:, tri_idx, pid) at the end, causing an out-of-bounds read. The caller at src/simulation/m_compute_levelset.fpp:688 does not validate gpu_ntrs(patch_id) > 0 before invoking this function.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/common/m_model.fpp` around lines 1209 - 1267, The subroutine f_distance_normals_3D_flat can index trs_n(:, tri_idx, pid) with tri_idx==0 when ntrs <= 0; add a guard at the top of f_distance_normals_3D_flat that checks if ntrs <= 0 and in that case set distance to a safe value (e.g. a large positive sentinel or 0._wp as appropriate) and set normals to a defined default (e.g. zero vector) and return immediately to avoid using tri_idx/uninitialized indices; ensure you reference and update tri_idx, distance and normals in this early-return path so no out-of-bounds access to trs_n occurs.

coderabbitai · 2026-02-19T19:08:53Z

src/common/m_model.fpp

+    subroutine s_pack_model_for_gpu(ma)
+        type(t_model_array), intent(inout) :: ma
+        integer :: i
+
+        ma%ntrs = ma%model%ntrs
+        allocate (ma%trs_v(1:3, 1:3, 1:ma%ntrs))
+        allocate (ma%trs_n(1:3, 1:ma%ntrs))
+
+        do i = 1, ma%ntrs
+            ma%trs_v(:, :, i) = ma%model%trs(i)%v(:, :)
+            ma%trs_n(:, i) = ma%model%trs(i)%n(:)
+        end do
+    end subroutine
+


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find all call sites to s_pack_model_for_gpu rg -nP --type-add 'fpp:*.fpp' --type 'fpp' '\bs_pack_model_for_gpu\s*\(' src || true

Repository: MFlowCode/MFC

Length of output: 220

🏁 Script executed:

#!/bin/bash # Also search for the subroutine definition and nearby code rg -nP --type-add 'fpp:*.fpp' --type 'fpp' -B 5 -A 20 'subroutine s_pack_model_for_gpu' src || true

Repository: MFlowCode/MFC

Length of output: 1111

🏁 Script executed:

#!/bin/bash # Check if ma%trs_v or ma%trs_n are referenced elsewhere to understand lifecycle rg -nP --type-add 'fpp:*.fpp' --type 'fpp' 'ma%trs_v|ma%trs_n' src || true

Repository: MFlowCode/MFC

Length of output: 367

🏁 Script executed:

#!/bin/bash # Get context around the call site in m_ib_patches.fpp sed -n '1080,1120p' src/simulation/m_ib_patches.fpp

Repository: MFlowCode/MFC

Length of output: 1850

🏁 Script executed:

#!/bin/bash # Search for surrounding loop context (broader range) sed -n '1050,1130p' src/simulation/m_ib_patches.fpp

Repository: MFlowCode/MFC

Length of output: 4004

🏁 Script executed:

#!/bin/bash # Check if models is deallocated/re-initialized somewhere rg -nP --type-add 'fpp:*.fpp' --type 'fpp' 'models\s*\(|deallocate.*models' src/simulation/ | head -30

Repository: MFlowCode/MFC

Length of output: 3130

🏁 Script executed:

#!/bin/bash # Check if s_instantiate_STL_models or models are re-initialized in any time-stepping/simulation loop rg -nP --type-add 'fpp:*.fpp' --type 'fpp' 's_instantiate_STL_models|allocate.*models\(' src/simulation/ | grep -v '\.fpp-' | head -20

Repository: MFlowCode/MFC

Length of output: 1135

🏁 Script executed:

#!/bin/bash # Search for any re-initialization or deallocate of models during simulation rg -nP --type-add 'fpp:*.fpp' --type 'fpp' 'deallocate.*models|models.*=|trs_v.*deallocate|trs_n.*deallocate' src | head -20

Repository: MFlowCode/MFC

Length of output: 1318

Make s_pack_model_for_gpu re-entrant by checking allocations before allocating.

The function is called only once per object during initialization (in s_instantiate_STL_models), so there is currently no runtime risk. However, unconditional allocations of ma%trs_v and ma%trs_n violate defensive programming practices and would fail silently if the function were ever called again on the same object. Add pre-allocation checks:

Proposed fix

ma%ntrs = ma%model%ntrs + if (allocated(ma%trs_v)) deallocate(ma%trs_v) + if (allocated(ma%trs_n)) deallocate(ma%trs_n) allocate (ma%trs_v(1:3, 1:3, 1:ma%ntrs)) allocate (ma%trs_n(1:3, 1:ma%ntrs))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

subroutine s_pack_model_for_gpu(ma)

type(t_model_array), intent(inout) :: ma

integer :: i

ma%ntrs = ma%model%ntrs

allocate (ma%trs_v(1:3, 1:3, 1:ma%ntrs))

allocate (ma%trs_n(1:3, 1:ma%ntrs))

do i = 1, ma%ntrs

ma%trs_v(:, :, i) = ma%model%trs(i)%v(:, :)

ma%trs_n(:, i) = ma%model%trs(i)%n(:)

end do

end subroutine

subroutine s_pack_model_for_gpu(ma)

type(t_model_array), intent(inout) :: ma

integer :: i

ma%ntrs = ma%model%ntrs

if (allocated(ma%trs_v)) deallocate(ma%trs_v)

if (allocated(ma%trs_n)) deallocate(ma%trs_n)

allocate (ma%trs_v(1:3, 1:3, 1:ma%ntrs))

allocate (ma%trs_n(1:3, 1:ma%ntrs))

do i = 1, ma%ntrs

ma%trs_v(:, :, i) = ma%model%trs(i)%v(:, :)

ma%trs_n(:, i) = ma%model%trs(i)%n(:)

end do

end subroutine

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/common/m_model.fpp` around lines 1402 - 1415, s_pack_model_for_gpu currently unconditionally allocates ma%trs_v and ma%trs_n; make it re-entrant by checking existing allocations before allocating: set ma%ntrs = ma%model%ntrs, then for ma%trs_v and ma%trs_n use allocated(...) to detect prior allocation and only allocate if not allocated or if the current allocation dimensions don't match ma%ntrs (in that case deallocate then allocate with the correct bounds); ensure you copy into ma%trs_v(:, :, i) and ma%trs_n(:, i) after the allocation checks complete.

coderabbitai · 2026-02-19T19:08:54Z

src/simulation/m_ib_patches.fpp

+    !> The STL patch is a 2D geometry that is imported from an STL file.
    !! @param patch_id is the patch identifier
-    !! @param ib_markers_sf Array to track patch ids
-    !! @param STL_levelset STL levelset
-    !! @param STL_levelset_norm STL levelset normals
-    subroutine s_ib_model(patch_id, ib_markers_sf)
+    !! @param ib_markers Array to track patch ids
+    subroutine s_ib_model(patch_id, ib_markers)

        integer, intent(in) :: patch_id
-        integer, dimension(0:m, 0:n, 0:p), intent(inout) :: ib_markers_sf
-
-        integer :: i, j, k !< Generic loop iterators
+        type(integer_field), intent(inout) :: ib_markers

-        type(t_model), pointer :: model
+        integer :: i, j, k  !< Generic loop iterators
+        integer :: spc

-        real(wp) :: eta
+        real(wp) :: eta, threshold
        real(wp), dimension(1:3) :: point, local_point, offset
-        real(wp), dimension(1:3) :: center, xyz_local
+        real(wp), dimension(1:3) :: center, xy_local
        real(wp), dimension(1:3, 1:3) :: inverse_rotation

-        model => models(patch_id)%model
        center = 0._wp
        center(1) = patch_ib(patch_id)%x_centroid
        center(2) = patch_ib(patch_id)%y_centroid
-        if (p > 0) center(3) = patch_ib(patch_id)%z_centroid
        inverse_rotation(:, :) = patch_ib(patch_id)%rotation_matrix_inverse(:, :)
        offset(:) = patch_ib(patch_id)%centroid_offset(:)
+        spc = patch_ib(patch_id)%model_spc
+        threshold = patch_ib(patch_id)%model_threshold

-        do i = 0, m
-            do j = 0, n
-                do k = 0, p
+        $:GPU_PARALLEL_LOOP(private='[i,j, xy_local, eta]',&
+                  & copyin='[patch_id,center,inverse_rotation, offset, spc, threshold]', collapse=2)
+        do i = -gp_layers, m + gp_layers
+            do j = -gp_layers, n + gp_layers

-                    xyz_local = [x_cc(i) - center(1), y_cc(j) - center(2), 0._wp]
-                    if (p > 0) then
-                        xyz_local(3) = z_cc(k) - center(3)
-                    end if
+                xy_local = [x_cc(i) - center(1), y_cc(j) - center(2), 0._wp]
+                xy_local = matmul(inverse_rotation, xy_local)
+                xy_local = xy_local - offset
+
+                if (grid_geometry == 3) then
+                    xy_local = f_convert_cyl_to_cart(xy_local)
+                end if
+
+                eta = f_model_is_inside_flat(gpu_ntrs(patch_id), &
+                                             gpu_trs_v, gpu_trs_n, &
+                                             patch_id, &
+                                             xy_local, (/dx(i), dy(j), 0._wp/), &
+                                             spc)
+
+                ! Reading STL boundary vertices and compute the levelset and levelset_norm
+                if (eta > threshold) then
+                    ib_markers%sf(i, j, 0) = patch_id
+                end if
+
+            end do
+        end do
+        $:END_GPU_PARALLEL_LOOP()
+
+    end subroutine s_ib_model
+
+    !> The STL patch is a 3D geometry that is imported from an STL file.
+    !! @param patch_id is the patch identifier
+    !! @param ib_markers Array to track patch ids
+    subroutine s_ib_3d_model(patch_id, ib_markers)
+
+        integer, intent(in) :: patch_id
+        type(integer_field), intent(inout) :: ib_markers
+
+        integer :: i, j, k, il, ir, jl, jr, kl, kr !< Generic loop iterators
+        integer :: spc
+
+        real(wp) :: eta, threshold, corner_distance
+        real(wp), dimension(1:3) :: point, local_point, offset
+        real(wp), dimension(1:3) :: center, xyz_local
+        real(wp), dimension(1:3, 1:3) :: inverse_rotation
+
+        center = 0._wp
+        center(1) = patch_ib(patch_id)%x_centroid
+        center(2) = patch_ib(patch_id)%y_centroid
+        center(3) = patch_ib(patch_id)%z_centroid
+        inverse_rotation(:, :) = patch_ib(patch_id)%rotation_matrix_inverse(:, :)
+        offset(:) = patch_ib(patch_id)%centroid_offset(:)
+        spc = patch_ib(patch_id)%model_spc
+        threshold = patch_ib(patch_id)%model_threshold
+
+        il = -gp_layers
+        jl = -gp_layers
+        kl = -gp_layers
+        ir = m + gp_layers
+        jr = n + gp_layers
+        kr = p + gp_layers
+        corner_distance = 0._wp
+        do i = 1, 3
+            corner_distance = corner_distance + maxval(abs(stl_bounding_boxes(patch_id, i, 1:3)))**2 ! distance to rim of cylinder
+        end do
+        corner_distance = sqrt(corner_distance)
+        call get_bounding_indices(center(1) - corner_distance, center(1) + corner_distance, x_cc, il, ir)
+        call get_bounding_indices(center(2) - corner_distance, center(2) + corner_distance, y_cc, jl, jr)
+        call get_bounding_indices(center(3) - corner_distance, center(3) + corner_distance, z_cc, kl, kr)
+
+        $:GPU_PARALLEL_LOOP(private='[i,j,k, xyz_local, eta]',&
+                  & copyin='[patch_id,center,inverse_rotation, offset, spc, threshold]', collapse=3)
+        do i = il, ir
+            do j = jl, jr
+                do k = kl, kr
+
+                    xyz_local = [x_cc(i) - center(1), y_cc(j) - center(2), z_cc(k) - center(3)]
                    xyz_local = matmul(inverse_rotation, xyz_local)
                    xyz_local = xyz_local - offset

                    if (grid_geometry == 3) then
                        xyz_local = f_convert_cyl_to_cart(xyz_local)
                    end if

-                    if (p == 0) then
-                        eta = f_model_is_inside(model, xyz_local, (/dx(i), dy(j), 0._wp/), patch_ib(patch_id)%model_spc)
-                    else
-                        eta = f_model_is_inside(model, xyz_local, (/dx(i), dy(j), dz(k)/), patch_ib(patch_id)%model_spc)
-                    end if
+                    eta = f_model_is_inside_flat(gpu_ntrs(patch_id), &
+                                                 gpu_trs_v, gpu_trs_n, &
+                                                 patch_id, &
+                                                 xyz_local, (/dx(i), dy(j), dz(k)/), &
+                                                 spc)

                    ! Reading STL boundary vertices and compute the levelset and levelset_norm
                    if (eta > patch_ib(patch_id)%model_threshold) then
-                        ib_markers_sf(i, j, k) = patch_id
+                        ib_markers%sf(i, j, k) = patch_id
                    end if

                end do
            end do
        end do
+        $:END_GPU_PARALLEL_LOOP()

-    end subroutine s_ib_model
+    end subroutine s_ib_3d_model
+


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # 1. Find s_instantiate_STL_models and s_apply_ib_patches calls to check ordering echo "=== Looking for s_instantiate_STL_models and s_apply_ib_patches calls ===" rg -n 's_instantiate_STL_models|s_apply_ib_patches' src/simulation/ --type fpp echo -e "\n=== Checking if they're in correct order within m_ibm.fpp ===" rg -n 's_instantiate_STL_models\|s_apply_ib_patches' src/simulation/m_ibm.fpp -B 2 -A 2

Repository: MFlowCode/MFC

Length of output: 238

🏁 Script executed:

#!/bin/bash # 2. Find where gpu_ntrs/gpu_trs_v/gpu_trs_n are allocated and managed echo "=== Finding gpu_ntrs/gpu_trs_v/gpu_trs_n allocations and GPU_ENTER_DATA ===" rg -n 'allocate.*gpu_ntrs|allocate.*gpu_trs_v|allocate.*gpu_trs_n' src/simulation/ --type fpp echo -e "\n=== Searching for GPU_ENTER_DATA calls with these variables ===" rg -n 'GPU_ENTER_DATA' src/simulation/m_ib_patches.fpp -A 5 || rg -n 'GPU_ENTER_DATA' src/simulation/m_ibm.fpp -A 5

Repository: MFlowCode/MFC

Length of output: 821

🏁 Script executed:

#!/bin/bash # 3. Verify bounding-box logic in both functions echo "=== Checking s_ib_model for bounding-box logic ===" sed -n '828,875p' src/simulation/m_ib_patches.fpp | grep -n 'il\|ir\|jl\|jr\|bounding' || echo "No bounding-box narrowing found in s_ib_model" echo -e "\n=== Checking s_ib_3d_model for bounding-box logic ===" sed -n '877,953p' src/simulation/m_ib_patches.fpp | grep -n 'il\|ir\|jl\|jr\|kl\|kr\|bounding'

Repository: MFlowCode/MFC

Length of output: 1111

🏁 Script executed:

#!/bin/bash # 4. Check data movement for gpu_ntrs/gpu_trs_v/gpu_trs_n in the GPU loop context echo "=== Checking GPU_PARALLEL_LOOP copyin clause for these variables ===" sed -n '828,875p' src/simulation/m_ib_patches.fpp | grep -A 2 'GPU_PARALLEL_LOOP' echo -e "\n=== In s_ib_3d_model ===" sed -n '877,953p' src/simulation/m_ib_patches.fpp | grep -A 2 'GPU_PARALLEL_LOOP' echo -e "\n=== Checking if gpu_trs_v/gpu_trs_n are explicitly referenced ===" rg -n 'gpu_trs_v|gpu_trs_n' src/simulation/m_ib_patches.fpp | head -20

Repository: MFlowCode/MFC

Length of output: 1440

🏁 Script executed:

#!/bin/bash # Check the broader context of s_apply_ib_patches and see if GPU_ENTER_DATA creates persistent region rg -n 's_apply_ib_patches' src/simulation/m_ibm.fpp -B 10 -A 20 | head -80

Repository: MFlowCode/MFC

Length of output: 2981

🏁 Script executed:

#!/bin/bash # Look for GPU_DATA wrapping context around the kernel calls rg -n 'GPU_DATA|GPU_ENTER_DATA|GPU_EXIT_DATA' src/simulation/m_ibm.fpp -B 2 -A 2

Repository: MFlowCode/MFC

Length of output: 493

🏁 Script executed:

#!/bin/bash # Check f_model_is_inside_flat signature to see if it expects device arrays rg -n 'subroutine\s+f_model_is_inside_flat|function\s+f_model_is_inside_flat' src/simulation/ -A 15

Repository: MFlowCode/MFC

Length of output: 39

🏁 Script executed:

#!/bin/bash # Verify where GPU_ENTER_DATA is called and what scope it governs rg -n 'subroutine s_instantiate_STL_models' src/simulation/m_ib_patches.fpp -A 300 | tail -150

Repository: MFlowCode/MFC

Length of output: 6345

🏁 Script executed:

#!/bin/bash # Check if there's a persistent GPU_ENTER_DATA region wrapping both calls # Look for GPU_ENTER_DATA without matching GPU_EXIT_DATA rg -n 'GPU_ENTER_DATA|GPU_EXIT_DATA' src/simulation/m_ibm.fpp | head -40

Repository: MFlowCode/MFC

Length of output: 185

🏁 Script executed:

#!/bin/bash # Verify the exact scope of GPU_ENTER_DATA at line 1160 - is it persistent? sed -n '1140,1170p' src/simulation/m_ib_patches.fpp

Repository: MFlowCode/MFC

Length of output: 1881

🏁 Script executed:

#!/bin/bash # Check where gpu_ntrs/gpu_trs_v/gpu_trs_n might be declared with present() or accessed post-ENTER_DATA rg -n 'present\|gpu_ntrs\|gpu_trs_v\|gpu_trs_n' src/simulation/m_ib_patches.fpp | grep -A 3 -B 3 'present'

Repository: MFlowCode/MFC

Length of output: 39

🏁 Script executed:

#!/bin/bash # Search for how GPU data persistence is managed - look for surrounding GPU_DATA or block context sed -n '1100,1170p' src/simulation/m_ib_patches.fpp | head -80

Repository: MFlowCode/MFC

Length of output: 3699

🏁 Script executed:

#!/bin/bash # Check if there's a GPU_EXIT_DATA anywhere in the file rg -n 'GPU_EXIT_DATA' src/simulation/m_ib_patches.fpp || echo "No GPU_EXIT_DATA found" # Check broader context around GPU_ENTER_DATA call at line 1160 sed -n '1090,1170p' src/simulation/m_ib_patches.fpp

Repository: MFlowCode/MFC

Length of output: 4049

🏁 Script executed:

#!/bin/bash # Look for how f_model_is_inside_flat uses the arrays - check if it's marked GPU_ROUTINE rg -n 'f_model_is_inside_flat' src/simulation/ -A 5 | head -40

Repository: MFlowCode/MFC

Length of output: 1164

🏁 Script executed:

#!/bin/bash # Check the GPU macro definitions to understand data region semantics fd 'gpuParallelization\|gpu.*\.fpp' src/simulation/ && cat src/simulation/gpuParallelization.fpp 2>/dev/null | head -200 || echo "File not found, searching for macro definitions" rg -n '@:GPU_PARALLEL_LOOP|@:GPU_DATA|@:GPU_ENTER_DATA' src/ -B 2 | head -50

Repository: MFlowCode/MFC

Length of output: 39

🏁 Script executed:

#!/bin/bash # Check if there's a broad GPU_DATA or GPU context managing device persistence around the entire flow rg -n 'GPU_DATA|GPU_ENTER_DATA|GPU_EXIT_DATA|GPU_ROUTINE' src/simulation/m_ibm.fpp | head -20

Repository: MFlowCode/MFC

Length of output: 280

🏁 Script executed:

#!/bin/bash # Look at the main timestepping loop structure to understand device data persistence scope rg -n 'subroutine s_time_step\|end subroutine s_time_step' src/simulation/m_ibm.fpp -A 300 | head -150

Repository: MFlowCode/MFC

Length of output: 39

🏁 Script executed:

#!/bin/bash # Check if the GPU arrays like gpu_trs_v are declared as module-level allocatables that persist rg -n 'gpu_ntrs|gpu_trs_v|gpu_trs_n' src/simulation/m_ib_patches.fpp | grep -i 'allocatable\|dimension' | head -10

Repository: MFlowCode/MFC

Length of output: 39

🏁 Script executed:

#!/bin/bash # Search for module-level declarations of gpu arrays rg -n 'allocatable.*gpu_ntrs|allocatable.*gpu_trs_v|allocatable.*gpu_trs_n' src/simulation/m_ib_patches.fpp -B 2 -A 2

Repository: MFlowCode/MFC

Length of output: 39

🏁 Script executed:

#!/bin/bash # Find where these gpu arrays are declared at module scope rg -n 'integer.*allocatable|real.*allocatable' src/simulation/m_ib_patches.fpp | grep -i gpu

Repository: MFlowCode/MFC

Length of output: 39

🏁 Script executed:

#!/bin/bash # Get the entire module-level variable declarations section sed -n '1,100p' src/simulation/m_ib_patches.fpp | grep -n 'allocatable\|dimension'

Repository: MFlowCode/MFC

Length of output: 39

🏁 Script executed:

#!/bin/bash # Search for the module declaration section more broadly head -200 src/simulation/m_ib_patches.fpp | tail -100

Repository: MFlowCode/MFC

Length of output: 3972

🏁 Script executed:

#!/bin/bash # Check module-level variable declarations in m_ib_patches grep -n 'module m_ib_patches' src/simulation/m_ib_patches.fpp -A 50 | head -60

Repository: MFlowCode/MFC

Length of output: 2313

🏁 Script executed:

#!/bin/bash # Verify whether gpu_trs_v/gpu_trs_n require present() or explicit data movement in nested GPU regions # Look for documentation or examples in gpuParallelization guidance rg -n 'present\|GPU_PARALLEL_LOOP' src/simulation/m_ibm.fpp -B 2 -A 3 | head -80

Repository: MFlowCode/MFC

Length of output: 39

🏁 Script executed:

#!/bin/bash # Check if any other GPU_PARALLEL_LOOP kernels use variables allocated via GPU_ENTER_DATA elsewhere rg -n 'GPU_PARALLEL_LOOP' src/simulation/ -A 5 | grep -A 5 'copyin\|present' | head -60

Repository: MFlowCode/MFC

Length of output: 4582

🏁 Script executed:

#!/bin/bash # Critical check: verify gpu_ntrs/gpu_trs_v/gpu_trs_n are referenced but NOT in data clauses echo "=== In s_ib_model GPU_PARALLEL_LOOP (line ~850) ===" sed -n '850,870p' src/simulation/m_ib_patches.fpp echo -e "\n=== In s_ib_3d_model GPU_PARALLEL_LOOP (line ~920) ===" sed -n '920,940p' src/simulation/m_ib_patches.fpp echo -e "\n=== Checking for 'present' in GPU_PARALLEL_LOOP patterns ===" rg -n 'present=' src/simulation/ | head -10

Repository: MFlowCode/MFC

Length of output: 2233

🏁 Script executed:

#!/bin/bash # Check if gpu_ntrs/gpu_trs_v/gpu_trs_n are module-level in m_model or elsewhere fd -e fpp -e f90 | xargs grep -l 'gpu_ntrs\|gpu_trs_v' | head -5 rg -n 'integer.*allocatable.*gpu_ntrs|real.*allocatable.*gpu_trs' src/ -B 2 | head -30

Repository: MFlowCode/MFC

Length of output: 549

🏁 Script executed:

#!/bin/bash # Check if s_instantiate_STL_models is called before s_apply_ib_patches in BOTH initial and restart paths echo "=== Initial setup (lines ~100-120) ===" sed -n '100,120p' src/simulation/m_ibm.fpp echo -e "\n=== Restart/update path (lines ~935-950) ===" sed -n '935,950p' src/simulation/m_ibm.fpp

Repository: MFlowCode/MFC

Length of output: 1505

🏁 Script executed:

#!/bin/bash # Verify that GPU arrays are module-level and persisted echo "=== Module scope declaration in m_model.fpp ===" sed -n '30,40p' src/common/m_model.fpp echo -e "\n=== Confirm they're never deallocated ===" rg -n 'deallocate.*gpu_ntrs|deallocate.*gpu_trs_v|deallocate.*gpu_trs_n' src/

Repository: MFlowCode/MFC

Length of output: 810

Critical: GPU model data not refreshed in s_update_mib restart path; also declare persistent GPU arrays explicitly in kernel data clauses.

The gpu_ntrs/gpu_trs_v/gpu_trs_n arrays (module-level in m_model.fpp) are allocated and pushed to device via GPU_ENTER_DATA in s_instantiate_STL_models() with persistent lifetime. However, in the restart path (s_update_mib), s_apply_ib_patches() is called at line 945 without re-calling s_instantiate_STL_models() to refresh GPU data. If s_update_ib_rotation_matrix() modifies model state, the device arrays will be stale on the next marker generation.

Additionally, both s_ib_model and s_ib_3d_model reference gpu_ntrs/gpu_trs_v/gpu_trs_n inside GPU_PARALLEL_LOOP but do not explicitly declare them in the copyin or present clauses—they rely on implicit device persistence. For clarity and robustness, add present='[gpu_ntrs, gpu_trs_v, gpu_trs_n]' to the loop declarations.

Finally, s_ib_model (2D) scans the full grid; adding the same bounding-box narrowing logic from s_ib_3d_model would improve performance for sparse STL objects.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/simulation/m_ib_patches.fpp` around lines 828 - 953, s_update_mib’s restart path must refresh the device-resident model arrays before generating markers: ensure s_instantiate_STL_models() (or an equivalent GPU_ENTER_DATA/refresh routine for module arrays gpu_ntrs, gpu_trs_v, gpu_trs_n) is called or re-entered before s_apply_ib_patches() in the restart branch so device data reflects any changes from s_update_ib_rotation_matrix(); in s_ib_model and s_ib_3d_model, add explicit device presence to the GPU_PARALLEL_LOOP clauses by including present='[gpu_ntrs, gpu_trs_v, gpu_trs_n]' (in addition to existing copyin) to avoid relying on implicit persistence; and narrow the 2D scan in s_ib_model by reusing the bounding-index logic from s_ib_3d_model (compute corner_distance from stl_bounding_boxes and call get_bounding_indices to set i/j limits) to avoid scanning the full grid for sparse STL patches.

coderabbitai · 2026-02-19T19:08:54Z

src/simulation/m_ib_patches.fpp

+    subroutine s_instantiate_STL_models()
+
+        ! Variables for IBM+STL
+        real(wp) :: normals(1:3) !< Boundary normal buffer
+        integer :: boundary_vertex_count, boundary_edge_count, total_vertices !< Boundary vertex
+        real(wp), allocatable, dimension(:, :, :) :: boundary_v !< Boundary vertex buffer
+        real(wp), allocatable, dimension(:, :) :: interpolated_boundary_v !< Interpolated vertex buffer
+        real(wp) :: dx_local, dy_local, dz_local !< Levelset distance buffer
+        logical :: interpolate !< Logical variable to determine whether or not the model should be interpolated
+
+        integer :: i, j, k !< Generic loop iterators
+        integer :: patch_id
+
+        type(t_bbox) :: bbox, bbox_old
+        type(t_model) :: model
+        type(ic_model_parameters) :: params
+
+        real(wp) :: eta
+        real(wp), dimension(1:3) :: point, model_center
+        real(wp) :: grid_mm(1:3, 1:2)
+
+        real(wp), dimension(1:4, 1:4) :: transform, transform_n
+
+        dx_local = minval(dx); dy_local = minval(dy)
+        if (p /= 0) dz_local = minval(dz)
+
+        do patch_id = 1, num_ibs
+            if (patch_ib(patch_id)%geometry == 5 .or. patch_ib(patch_id)%geometry == 12) then
+                allocate (models(patch_id)%model)
+                print *, " * Reading model: "//trim(patch_ib(patch_id)%model_filepath)
+
+                model = f_model_read(patch_ib(patch_id)%model_filepath)
+                params%scale(:) = patch_ib(patch_id)%model_scale(:)
+                params%translate(:) = patch_ib(patch_id)%model_translate(:)
+                params%rotate(:) = patch_ib(patch_id)%model_rotate(:)
+                params%spc = patch_ib(patch_id)%model_spc
+                params%threshold = patch_ib(patch_id)%model_threshold
+
+                if (f_approx_equal(dot_product(params%scale, params%scale), 0._wp)) then
+                    params%scale(:) = 1._wp
+                end if
+
+                if (proc_rank == 0) then
+                    print *, " * Transforming model."
+                end if
+
+                ! Get the model center before transforming the model
+                bbox_old = f_create_bbox(model)
+                model_center(1:3) = (bbox_old%min(1:3) + bbox_old%max(1:3))/2._wp
+
+                ! Compute the transform matrices for vertices and normals
+                transform = f_create_transform_matrix(params, model_center)
+                transform_n = f_create_transform_matrix(params)
+
+                call s_transform_model(model, transform, transform_n)
+
+                ! Recreate the bounding box after transformation
+                bbox = f_create_bbox(model)
+
+                ! Show the number of vertices in the original STL model
+                if (proc_rank == 0) then
+                    print *, ' * Number of input model vertices:', 3*model%ntrs
+                end if
+
+                call f_check_boundary(model, boundary_v, boundary_vertex_count, boundary_edge_count)
+
+                ! Check if the model needs interpolation
+                if (p > 0) then
+                    call f_check_interpolation_3D(model, (/dx_local, dy_local, dz_local/), interpolate)
+                else
+                    call f_check_interpolation_2D(boundary_v, boundary_edge_count, (/dx_local, dy_local, 0._wp/), interpolate)
+                end if
+                interpolate = .false.
+
+                ! Show the number of edges and boundary edges in 2D STL models
+                if (proc_rank == 0 .and. p == 0) then
+                    print *, ' * Number of 2D model boundary edges:', boundary_edge_count
+                end if
+
+                ! Interpolate the STL model along the edges (2D) and on triangle facets (3D)
+                if (interpolate) then
+                    if (proc_rank == 0) then
+                        print *, ' * Interpolating STL vertices.'
+                    end if
+
+                    if (p > 0) then
+                        call f_interpolate_3D(model, (/dx, dy, dz/), interpolated_boundary_v, total_vertices)
+                    else
+                        call f_interpolate_2D(boundary_v, boundary_edge_count, (/dx, dy, dz/), interpolated_boundary_v, total_vertices)
+                    end if
+
+                    if (proc_rank == 0) then
+                        print *, ' * Total number of interpolated boundary vertices:', total_vertices
+                    end if
+                end if
+
+                if (proc_rank == 0) then
+                    write (*, "(A, 3(2X, F20.10))") "    > Model:  Min:", bbox%min(1:3)
+                    write (*, "(A, 3(2X, F20.10))") "    >         Cen:", (bbox%min(1:3) + bbox%max(1:3))/2._wp
+                    write (*, "(A, 3(2X, F20.10))") "    >         Max:", bbox%max(1:3)
+
+                    grid_mm(1, :) = (/minval(x_cc(0:m)) - 0.5_wp*dx_local, maxval(x_cc(0:m)) + 0.5_wp*dx_local/)
+                    grid_mm(2, :) = (/minval(y_cc(0:n)) - 0.5_wp*dy_local, maxval(y_cc(0:n)) + 0.5_wp*dy_local/)
+
+                    if (p > 0) then
+                        grid_mm(3, :) = (/minval(z_cc(0:p)) - 0.5_wp*dz_local, maxval(z_cc(0:p)) + 0.5_wp*dz_local/)
+                    else
+                        grid_mm(3, :) = (/0._wp, 0._wp/)
+                    end if
+
+                    write (*, "(A, 3(2X, F20.10))") "    > Domain: Min:", grid_mm(:, 1)
+                    write (*, "(A, 3(2X, F20.10))") "    >         Cen:", (grid_mm(:, 1) + grid_mm(:, 2))/2._wp
+                    write (*, "(A, 3(2X, F20.10))") "    >         Max:", grid_mm(:, 2)
+                end if
+
+                allocate (stl_bounding_boxes(patch_id, 1:3, 1:3))
+                stl_bounding_boxes(patch_id, 1, 1:3) = [bbox%min(1), (bbox%min(1) + bbox%max(1))/2._wp, bbox%max(1)]
+                stl_bounding_boxes(patch_id, 2, 1:3) = [bbox%min(2), (bbox%min(2) + bbox%max(2))/2._wp, bbox%max(2)]
+                stl_bounding_boxes(patch_id, 3, 1:3) = [bbox%min(3), (bbox%min(3) + bbox%max(3))/2._wp, bbox%max(3)]
+
+                models(patch_id)%model = model
+                models(patch_id)%boundary_v = boundary_v
+                models(patch_id)%boundary_edge_count = boundary_edge_count
+                if (interpolate) then
+                    models(patch_id)%interpolate = 1
+                else
+                    models(patch_id)%interpolate = 0
+                end if
+                if (interpolate) then
+                    models(patch_id)%interpolated_boundary_v = interpolated_boundary_v
+                    models(patch_id)%total_vertices = total_vertices
+                end if
+
+            end if
+        end do
+
+        ! Pack and upload flat arrays for GPU (AFTER the loop)
+        block
+            integer :: pid, max_ntrs
+            integer :: max_bv1, max_bv2, max_bv3, max_iv1, max_iv2
+
+            max_ntrs = 0
+            max_bv1 = 0; max_bv2 = 0; max_bv3 = 0
+            max_iv1 = 0; max_iv2 = 0
+
+            do pid = 1, num_ibs
+                if (allocated(models(pid)%model)) then
+                    call s_pack_model_for_gpu(models(pid))
+                    max_ntrs = max(max_ntrs, models(pid)%ntrs)
+                end if
+                if (allocated(models(pid)%boundary_v)) then
+                    max_bv1 = max(max_bv1, size(models(pid)%boundary_v, 1))
+                    max_bv2 = max(max_bv2, size(models(pid)%boundary_v, 2))
+                    max_bv3 = max(max_bv3, size(models(pid)%boundary_v, 3))
+                end if
+                if (allocated(models(pid)%interpolated_boundary_v)) then
+                    max_iv1 = max(max_iv1, size(models(pid)%interpolated_boundary_v, 1))
+                    max_iv2 = max(max_iv2, size(models(pid)%interpolated_boundary_v, 2))
+                end if
+            end do
+
+            if (max_ntrs > 0) then
+                allocate (gpu_ntrs(1:num_ibs))
+                allocate (gpu_trs_v(1:3, 1:3, 1:max_ntrs, 1:num_ibs))
+                allocate (gpu_trs_n(1:3, 1:max_ntrs, 1:num_ibs))
+                allocate (gpu_interpolate(1:num_ibs))
+                allocate (gpu_boundary_edge_count(1:num_ibs))
+                allocate (gpu_total_vertices(1:num_ibs))
+
+                gpu_ntrs = 0
+                gpu_trs_v = 0._wp
+                gpu_trs_n = 0._wp
+                gpu_interpolate = 0
+                gpu_boundary_edge_count = 0
+                gpu_total_vertices = 0
+
+                if (max_bv1 > 0) then
+                    allocate (gpu_boundary_v(1:max_bv1, 1:max_bv2, 1:max_bv3, 1:num_ibs))
+                    gpu_boundary_v = 0._wp
+                end if
+
+                if (max_iv1 > 0) then
+                    allocate (gpu_interpolated_boundary_v(1:max_iv1, 1:max_iv2, 1:num_ibs))
+                    gpu_interpolated_boundary_v = 0._wp
+                end if
+
+                do pid = 1, num_ibs
+                    if (allocated(models(pid)%model)) then
+                        gpu_ntrs(pid) = models(pid)%ntrs
+                        gpu_trs_v(:, :, 1:models(pid)%ntrs, pid) = models(pid)%trs_v
+                        gpu_trs_n(:, 1:models(pid)%ntrs, pid) = models(pid)%trs_n
+                        gpu_interpolate(pid) = models(pid)%interpolate
+                        gpu_boundary_edge_count(pid) = models(pid)%boundary_edge_count
+                        gpu_total_vertices(pid) = models(pid)%total_vertices
+                    end if
+                    if (allocated(models(pid)%boundary_v)) then
+                        gpu_boundary_v(1:size(models(pid)%boundary_v, 1), &
+                                       1:size(models(pid)%boundary_v, 2), &
+                                       1:size(models(pid)%boundary_v, 3), pid) = models(pid)%boundary_v
+                    end if
+                    if (allocated(models(pid)%interpolated_boundary_v)) then
+                        gpu_interpolated_boundary_v(1:size(models(pid)%interpolated_boundary_v, 1), &
+                                                    1:size(models(pid)%interpolated_boundary_v, 2), pid) = models(pid)%interpolated_boundary_v
+                    end if
+                end do
+
+                $:GPU_ENTER_DATA(copyin='[gpu_ntrs, gpu_trs_v, gpu_trs_n, gpu_interpolate, gpu_boundary_edge_count, gpu_total_vertices]')
+                if (allocated(gpu_boundary_v)) then
+                    $:GPU_ENTER_DATA(copyin='[gpu_boundary_v]')
+                end if
+                if (allocated(gpu_interpolated_boundary_v)) then
+                    $:GPU_ENTER_DATA(copyin='[gpu_interpolated_boundary_v]')
+                end if
+            end if
+        end block
+


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Search for all occurrences of stl_bounding_boxes in the codebase echo "=== stl_bounding_boxes allocations and usage ===" rg -n "stl_bounding_boxes" src/simulation/m_ib_patches.fpp src/common/m_model.fpp 2>/dev/null || echo "Not found in specified files" # Broader search rg -n "stl_bounding_boxes" --type f90 --type fpp 2>/dev/null || true

Repository: MFlowCode/MFC

Length of output: 999

🏁 Script executed:

#!/bin/bash # Check total_vertices initialization and usage pattern echo "=== total_vertices usage pattern ===" rg -n "total_vertices" src/simulation/m_ib_patches.fpp | head -50 # Check gpu_total_vertices echo "=== gpu_total_vertices pattern ===" rg -n "gpu_total_vertices" src/simulation/m_ib_patches.fpp

Repository: MFlowCode/MFC

Length of output: 1328

🏁 Script executed:

#!/bin/bash # Check gpu_* allocations and deallocations echo "=== gpu_* array allocations ===" rg -n "allocate.*gpu_" src/simulation/m_ib_patches.fpp | head -30 echo "=== gpu_* deallocations ===" rg -n "deallocate.*gpu_" src/simulation/m_ib_patches.fpp | head -30 # Check if there's a finalize/cleanup subroutine echo "=== Potential cleanup routines ===" rg -n "subroutine.*finalize|subroutine.*cleanup" src/simulation/m_ib_patches.fpp

Repository: MFlowCode/MFC

Length of output: 855

🏁 Script executed:

#!/bin/bash # Check the module-level declarations to understand data structure echo "=== Module-level declarations for gpu_* and stl_bounding_boxes ===" rg -n "real.*gpu_|integer.*gpu_|allocatable.*gpu_|real.*stl_bounding|allocatable.*stl_bounding" src/simulation/m_ib_patches.fpp | head -40 # Look for module/global scope context head -150 src/simulation/m_ib_patches.fpp | tail -100

Repository: MFlowCode/MFC

Length of output: 4044

s_instantiate_STL_models has critical allocation/initialization issues breaking multi-model scenarios.

All four issues are confirmed:

stl_bounding_boxes allocation inside loop: Line 1069 allocates with allocate(stl_bounding_boxes(patch_id, 1:3, 1:3)) within the do patch_id = 1, num_ibs loop. This will fail on the second iteration—the first allocates shape (1,3,3), the second attempts (2,3,3), but the array is already allocated. Should allocate once before the loop with shape (num_ibs, 1:3, 1:3).

total_vertices uninitialized when interpolate=false: The local variable total_vertices (line 958) is only assigned at lines 1040 or 1042 inside the if (interpolate) block. Line 1084 unconditionally copies it into models(patch_id)%total_vertices, propagating garbage values when interpolation is skipped. Initialize or conditionally assign.

Incorrect spacing vector in interpolation: Lines 1040 and 1042 pass (/dx, dy, dz/) (array elements, not spacing magnitudes), though this is currently masked by line 1037 forcing interpolate = .false.. If interpolation is re-enabled, use (/dx_local, dy_local, dz_local/).

Missing allocation guards on gpu_ arrays*: Lines 1116–1121 allocate module-level gpu_* arrays without checking if already allocated. Calling s_instantiate_STL_models twice will fail. Add guards: if (.not. allocated(gpu_ntrs)) allocate(...).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/simulation/m_ib_patches.fpp` around lines 954 - 1169, s_instantiate_STL_models has four fixes: allocate stl_bounding_boxes once before the patch loop with shape (num_ibs,1:3,1:3) instead of calling allocate(stl_bounding_boxes(patch_id,...)) inside the loop; ensure total_vertices is initialized or only copied into models(patch_id)%total_vertices when interpolate is true (do not propagate uninitialized total_vertices when interpolation is skipped); pass spacing magnitudes to the interpolation routines by using (/dx_local, dy_local, dz_local/) when calling f_interpolate_3D/f_interpolate_2D (and remove or fix the unconditional interpolate = .false. that currently masks interpolation logic); and protect module-level gpu_* allocations (gpu_ntrs, gpu_trs_v, gpu_trs_n, gpu_interpolate, gpu_boundary_edge_count, gpu_total_vertices, gpu_boundary_v, gpu_interpolated_boundary_v) with if (.not. allocated(...)) checks before each allocate to allow repeated calls to s_instantiate_STL_models.

coderabbitai · 2026-02-19T19:08:54Z

toolchain/mfc/case_validator.py

+        self.prohibit(ib and (num_ibs <= 0 or num_ibs > 1000),
                     "num_ibs must be between 1 and num_patches_max (10)")


⚠️ Potential issue | 🟡 Minor

Fix outdated error message for num_ibs upper bound.

The validation allows up to 1000, but the message still says 10, which will mislead users. Consider deriving the bound from CONSTRAINTS to avoid future drift.

✅ Suggested fix

- self.prohibit(ib and (num_ibs <= 0 or num_ibs > 1000), - "num_ibs must be between 1 and num_patches_max (10)") + max_ibs = CONSTRAINTS["num_ibs"]["max"] + self.prohibit(ib and (num_ibs <= 0 or num_ibs > max_ibs), + f"num_ibs must be between 1 and num_patches_max ({max_ibs})")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@toolchain/mfc/case_validator.py` around lines 687 - 688, The error message passed to self.prohibit for the num_ibs check is stale (mentions 10) while the actual upper bound is 1000; update the message to reflect the real max and, to prevent future drift, construct the message using the authoritative constraint value (e.g. the CONSTRAINTS constant or the class attribute used for bounds) rather than a hardcoded literal—locate the self.prohibit call that checks num_ibs (the line with self.prohibit(ib and (num_ibs <= 0 or num_ibs > 1000), ...)) and interpolate/format the upper-bound from CONSTRAINTS (or self.CONSTRAINTS / CONSTRAINTS['num_ibs']['max'] as appropriate) so the message reads "num_ibs must be between 1 and <max>".

codeant-ai · 2026-02-20T04:08:48Z

CodeAnt AI is reviewing your PR.

Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

codeant-ai · 2026-02-20T04:12:10Z

Nitpicks 🔍

🔒 No security issues identified
⚡ Recommended areas for review GPU Kernel Data Missing Several newly-added GPU parallel loops (notably the STL model kernels such as s_ib_model and s_ib_3d_model) reference arrays and device buffers (e.g. x_cc, y_cc, dx, dy, dz, gpu_ntrs, gpu_trs_v, gpu_trs_n) that are not listed in the loop's copyin/copy lists. If those symbols are not available on-device by other means, kernels may read stale or host-only data, producing incorrect IB markings on the device. Verify all GPU_PARALLEL_LOOP directives include every array and device buffer the kernel reads. Atomic counter / reduction Several GPU-parallel loops use local counters with `copyin` and then update them with GPU atomics (e.g. `count`, `count_i`, `num_gps_local`, `num_inner_gps_local`). The combination of `copyin` (per-thread copy) and atomic capture/update as used may not provide the intended global accumulation on device memory and can cause lost updates or incorrect final counts after the kernel. Verify that the counters are allocated in device-shared memory (or passed by device pointer) and that the atomic operations target those device-resident variables. Also ensure correct initialization on the device before the parallel region. Memory / GPU upload sizing The STL packing and GPU upload path computes maxima (max_ntrs, max_bv, max_iv) and then allocates device-side arrays sized by num_ibs and these maxima. Check worst-case sizes (num_ibs up to 1000 per PR) to ensure memory fits on device. Also verify repeated allocate()/copyin sequences won't be invoked frequently at runtime (should be done once at initialization), and that the pack/enter-data sequence matches the lifetime of host-side arrays. Large compile-time bound The constant `num_patches_max` was increased from 10 to 1000. Verify all arrays, derived sizes, and routines that use this parameter can handle the larger bound without excessive memory usage, stack/static allocation problems, or performance regressions. Check MPI I/O, sizing of scratch buffers, and GPU memory footprints that depend on this constant. Neighbor indexing / boundary safety The marker-neighborhood search loops directly index neighbors using ranges like `i - gp_layers : i + gp_layers` (and similarly for `j,k`) without explicit bounds checks in the new code paths. If `gp_layers` is larger than the buffer or if values are close to domain boundaries this may read outside intended domain or into invalid memory on the device. Confirm that buffer/bounds guarantees still hold on all runs or explicitly clamp indices to the valid buffered range. Interpolate type mismatch The variable `interpolate` was changed from a `logical` to an `integer` and is later tested both as a boolean (`if (interpolate) then`) and by equality (`if (interpolate == 1) then`). This inconsistent usage can be a source of subtle bugs if `gpu_interpolate(...)` doesn't return exactly the expected integer values (0/1). Ensure the value/type of `gpu_interpolate` and all checks are consistent. Possible Bug The field `interpolate` was changed from `logical` to `integer`. Call-sites throughout the codebase may still treat it as a logical flag; this can change semantics and introduce subtle bugs. Verify every use-site, any comparisons or logical tests, and any GPU transfer code that expects booleans. GPU-device compatibility Several routines and data accesses used for STL models were switched to GPU-resident arrays (e.g., `gpu_trs_v`, `gpu_trs_n`, `gpu_interpolated_boundary_v`, `gpu_boundary_v`) and GPU-enabled calls (e.g., `f_distance_normals_3D_flat`, `f_interpolated_distance`, `f_normals`, `f_distance`). Verify that these functions are device-capable (compiled for GPU) and accept the new GPU-side data layout. Also ensure proper data residency/synchronization so no host-only arrays are accessed on device implicitly. RNG quality & parallel safety The PR replaces Fortran's RNG with a custom integer xorshift in `f_model_random_number` for use on the GPU. This is a low-quality PRNG for Monte-Carlo style ray sampling (potential bias) and the normalization (dividing by `huge(seed)`) may be fragile/implementation-dependent. Also the seed is derived deterministically from point coordinates and may create correlated samples for neighboring points or patterns across the domain. Verify statistical properties, reproducibility, and that the seed state is safe for parallel GPU execution. Robustness: tri_idx usage In `f_distance_normals_3D_flat` (and similar routines) `tri_idx` is initialized to 0 and later used to index triangle arrays. If `ntrs` can ever be zero (or none chosen as closest), indexing with 0 will be invalid. Add defensive checks or ensure `ntrs>0` is always true before using `tri_idx`.

codeant-ai · 2026-02-20T04:12:22Z

src/common/m_model.fpp

+                dir(k) = point(k) + f_model_random_number(rand_seed) - 0.5_wp
+            end do
+            dir_mag = sqrt(dir(1)*dir(1) + dir(2)*dir(2) + dir(3)*dir(3))
+            dir(:) = dir(:)/dir_mag


Suggestion: Guard the normalization of the ray direction in the flat "inside" check to avoid potential division by zero in degenerate cases. [custom_rule]

Severity Level: Minor ⚠️

Suggested change

dir(:) = dir(:)/dir_mag

if (dir_mag > 0._wp) then

dir(:) = dir(:)/dir_mag

else

dir(:) = 0._wp

end if

Why it matters? ⭐

Normalizing a zero-length direction vector can produce a division-by-zero or NaNs; guarding with a magnitude check directly fixes a correctness/numerical stability issue (priority #1 in the MFC rules). The suggested conditional is syntactically valid Fortran and directly prevents the unsafe division.

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** src/common/m_model.fpp **Line:** 613:613 **Comment:** *Custom Rule: Guard the normalization of the ray direction in the flat "inside" check to avoid potential division by zero in degenerate cases. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

👍 | 👎

codeant-ai · 2026-02-20T04:12:22Z

src/common/m_model.fpp

+        integer :: i
+
+        ma%ntrs = ma%model%ntrs
+        allocate (ma%trs_v(1:3, 1:3, 1:ma%ntrs))


Suggestion: Safely deallocate any existing packed triangle buffers before reallocating them to prevent memory leaks when repacking models for the GPU. [custom_rule]

Severity Level: Minor ⚠️

Suggested change

allocate (ma%trs_v(1:3, 1:3, 1:ma%ntrs))

if (allocated(ma%trs_v)) deallocate(ma%trs_v)

if (allocated(ma%trs_n)) deallocate(ma%trs_n)

Why it matters? ⭐

The PR added allocation of packed triangle buffers but did not guard against repeated calls to s_pack_model_for_gpu leaving previously allocated arrays attached — this directly violates the MFC memory rule about matching ALLOCATE/DEALLOCATE pairing. The proposed change (deallocate if allocated before reallocate) is a minimal, correct fix that prevents leaks and addresses the custom rule about paired allocation/deallocation.

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** src/common/m_model.fpp **Line:** 1399:1399 **Comment:** *Custom Rule: Safely deallocate any existing packed triangle buffers before reallocating them to prevent memory leaks when repacking models for the GPU. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

👍 | 👎

codeant-ai · 2026-02-20T04:12:22Z

src/simulation/m_compute_levelset.fpp


        integer :: i, j, k, patch_id, boundary_edge_count, total_vertices
-        logical :: interpolate
+        integer :: interpolate


Suggestion: Use a logical type for the interpolation flag to keep the conditional checks standard-conforming and portable across compilers. [custom_rule]

Severity Level: Minor ⚠️

Suggested change

integer :: interpolate

logical :: interpolate

Why it matters? ⭐

The PR declares interpolate as an INTEGER but elsewhere in this same subroutine it is tested in a logical context (e.g. if (interpolate) then in the 3D branch). Using an INTEGER in a logical expression is non‑standard Fortran and will break compiler portability (violates the repository's compiler portability guidance). Changing the declaration to LOGICAL directly addresses that rule violation and makes the intent explicit and standard-conforming.

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** src/simulation/m_compute_levelset.fpp **Line:** 650:650 **Comment:** *Custom Rule: Use a logical type for the interpolation flag to keep the conditional checks standard-conforming and portable across compilers. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

👍 | 👎

codeant-ai · 2026-02-20T04:12:22Z

src/simulation/m_ib_patches.fpp

+                    call f_check_interpolation_2D(boundary_v, boundary_edge_count, (/dx_local, dy_local, 0._wp/), interpolate)
+                end if
+                interpolate = .false.
+


Suggestion: Preserve the result of the interpolation check instead of unconditionally disabling interpolation to avoid changing STL model resolution behavior. [custom_rule]

Severity Level: Minor ⚠️

Suggested change

Why it matters? ⭐

The PR calls the interpolation-check routines but then unconditionally sets interpolate = .false., which negates the purpose of the checks. Per the repository's PR-pattern guidance (STL model interpolation / f_check_interpolation_* usage), the computed interpolate flag should be preserved so models that require interpolation remain interpolated. Removing the forced override directly restores correct behavior; this fixes a functional regression rather than being a cosmetic change.

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** src/simulation/m_ib_patches.fpp **Line:** 1032:1032 **Comment:** *Custom Rule: Preserve the result of the interpolation check instead of unconditionally disabling interpolation to avoid changing STL model resolution behavior. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

👍 | 👎

codeant-ai · 2026-02-20T04:12:22Z

toolchain/mfc/case_validator.py

                     "Immersed Boundaries do not work in 1D (requires n > 0)")
-        self.prohibit(ib and (num_ibs <= 0 or num_ibs > 10),
+        self.prohibit(ib and (num_ibs <= 0 or num_ibs > 1000),
                     "num_ibs must be between 1 and num_patches_max (10)")


Suggestion: Update the immersed boundaries count validation message so its documented range matches the new numerical constraint. [custom_rule]

Severity Level: Minor ⚠️

Suggested change

"num_ibs must be between 1 and num_patches_max (10)")

"num_ibs must be between 1 and 1000")

Why it matters? ⭐

The current code enforces num_ibs <= 1000 but the error message still references num_patches_max (10), which is inconsistent and confusing for users. The proposed change updates the message to reflect the actual numeric constraint (1..1000), directly fixing a correctness/clarity issue related to constraint reporting. This aligns with the repository guidance to prioritize correctness and clear parameter constraints.

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** toolchain/mfc/case_validator.py **Line:** 688:688 **Comment:** *Custom Rule: Update the immersed boundaries count validation message so its documented range matches the new numerical constraint. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

👍 | 👎

codeant-ai · 2026-02-20T04:13:33Z

src/common/m_model.fpp

+        real(wp), dimension(1:3) :: dist_buffer
+        integer :: i, j, tri_idx
+
+        dist_min = 1.e12_wp


Suggestion: In f_distance_normals_3D_flat, tri_idx remains zero when ntrs is zero, leading to an out-of-bounds access trs_n(:, tri_idx, pid) and undefined behavior; you should explicitly handle the ntrs <= 0 case before the loop by returning a safe default normal and distance. [off-by-one]

Severity Level: Major ⚠️

- ❌ 3D flat distance/normals routine may read out-of-bounds. - ⚠️ GPU levelset computations for STL IBs may crash.

Suggested change

dist_min = 1.e12_wp

! Handle degenerate case with no triangles to avoid invalid indexing

if (ntrs <= 0) then

normals = 0._wp

distance = 0._wp

return

end if

Steps of Reproduction ✅

1. Build the code including `module m_model` in `src/common/m_model.fpp`, ensuring bounds-checking is enabled (e.g., `-Mbounds` with NVHPC or `-check bounds` with ifort). 2. From any caller (e.g., a level-set or IBM GPU packing routine) invoke `f_distance_normals_3D_flat` (defined at lines 1203–1261 in `src/common/m_model.fpp`) with `ntrs = 0`, and with `trs_v`/`trs_n` passed from a `t_model_array` whose `trs_v`/`trs_n` were allocated via `s_pack_model_for_gpu` at lines 1395–1406 (which allocates them with lower bound 1 in the triangle dimension). 3. When `f_distance_normals_3D_flat` executes, the `do i = 1, ntrs` loop (lines 1215–1236) is skipped because `ntrs = 0`, leaving `tri_idx` at its initialized value of 0. 4. The routine then executes the assignments to `normals` at lines 1256–1258, accessing `trs_n(:, tri_idx, pid)` with `tri_idx = 0`; since `trs_n` is indexed from 1, this is an out-of-bounds access that will either trigger a runtime bounds error (with checking on) or cause undefined behavior/incorrect normals on CPU or GPU.

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** src/common/m_model.fpp **Line:** 1222:1222 **Comment:** *Off By One: In `f_distance_normals_3D_flat`, `tri_idx` remains zero when `ntrs` is zero, leading to an out-of-bounds access `trs_n(:, tri_idx, pid)` and undefined behavior; you should explicitly handle the `ntrs <= 0` case before the loop by returning a safe default normal and distance. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

👍 | 👎

codeant-ai · 2026-02-20T04:13:33Z

toolchain/mfc/case_validator.py

                     "Immersed Boundaries do not work in 1D (requires n > 0)")
-        self.prohibit(ib and (num_ibs <= 0 or num_ibs > 10),
+        self.prohibit(ib and (num_ibs <= 0 or num_ibs > 1000),
                     "num_ibs must be between 1 and num_patches_max (10)")


Suggestion: The validation condition now allows up to 1000 immersed boundaries, but the associated error message still says the maximum is 10, which will confuse users when num_ibs > 1000 is rejected with an incorrect lower limit; the message should be updated to reflect the new upper bound so that validation feedback matches actual behavior. [logic error]

Severity Level: Major ⚠️

- ⚠️ IBM case validation message misstates allowed num_ibs range. - ⚠️ Users confused configuring more than ten immersed boundaries.

Suggested change

"num_ibs must be between 1 and num_patches_max (10)")

"num_ibs must be between 1 and 1000")

Steps of Reproduction ✅

1. Create a case file (e.g., `case.py`) that enables immersed boundaries with `ib = 'T'` and sets `num_ibs = 1200` in the case parameters (mapped to keys `ib` and `num_ibs` in the params dictionary passed to the validator). 2. Run the validation tooling, e.g. via `./mfc.sh validate case.py` as suggested in `toolchain/mfc/case_validator.py:_format_errors()` (near the end of the file), which calls `validate_case_constraints(params, stage='simulation')`. 3. Inside `validate_case_constraints()` (bottom of `toolchain/mfc/case_validator.py`), a `CaseValidator` is created and `validator.validate('simulation')` is invoked, which in turn calls `validate_simulation()`, then `validate_common()`, and finally `check_ibm()` (around lines 679–691). 4. In `check_ibm()`, the condition `ib and (num_ibs <= 0 or num_ibs > 1000)` at line 687 evaluates True for `num_ibs = 1200`, so `self.prohibit()` records the error message `"num_ibs must be between 1 and num_patches_max (10)"`; when `validate()` raises `CaseConstraintError`, the user sees this misleading message that claims a maximum of 10 instead of the actual limit 1000.

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** toolchain/mfc/case_validator.py **Line:** 688:688 **Comment:** *Logic Error: The validation condition now allows up to 1000 immersed boundaries, but the associated error message still says the maximum is 10, which will confuse users when `num_ibs > 1000` is rejected with an incorrect lower limit; the message should be updated to reflect the new upper bound so that validation feedback matches actual behavior. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

👍 | 👎

codeant-ai · 2026-02-20T04:13:36Z

CodeAnt AI finished reviewing your PR.

coderabbitai

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/simulation/m_ibm.fpp (2)
457-530: 🛠️ Refactor suggestion | 🟠 Major

Use deviceptr instead of private for the pointer variable s_cc.

The documentation specifies that deviceptr is the parameter for pointer variables in GPU regions. Currently, s_cc is declared in the private list despite being a pointer that is reassigned via => to x_cc, y_cc, or z_cc inside the kernel. Replace s_cc in the private list with deviceptr='[s_cc]' to follow the documented GPU macro API, or refactor to eliminate the pointer reassignment entirely by directly indexing the coordinate arrays based on the dim variable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_ibm.fpp` around lines 457 - 530, The GPU kernel macro
invocation lists s_cc as private but s_cc is a pointer reassigned (via =>) to
x_cc, y_cc, or z_cc inside the loop; change the kernel parameter handling to use
deviceptr for this pointer or eliminate the pointer reassignment. Specifically,
update the GPU region (the $:GPU_PARALLEL_LOOP(...) / $:END_GPU_PARALLEL_LOOP
block) so that s_cc is passed as deviceptr='[s_cc]' instead of appearing in the
private list, or refactor the inner loop in the do dim loop to index x_cc, y_cc,
z_cc directly based on dim (removing s_cc and its pointer reassignments) while
keeping the rest of the logic that computes ip_loc, ip_grid, norm, and bound
intact (references: s_cc, x_cc, y_cc, z_cc, ghost_points_in, GPU_PARALLEL_LOOP).
1135-1143: ⚠️ Potential issue | 🟠 Major

Missing deallocation of ghost_points, inner_points, and models in finalize.

ghost_points and inner_points are allocated at lines 129–130 and entered into GPU data at line 132 via GPU_ENTER_DATA(copyin='[ghost_points,inner_points]') in s_ibm_setup, but s_finalize_ibm_module does not clean them up. Similarly, models is allocated at line 73 in s_initialize_ibm_module but never deallocated. This leaks both host and GPU memory.
Proposed fix
  impure subroutine s_finalize_ibm_module()

      @:DEALLOCATE(ib_markers%sf)
+     if (allocated(ghost_points)) then
+         $:GPU_EXIT_DATA(delete='[ghost_points,inner_points]')
+         @:DEALLOCATE(ghost_points)
+         @:DEALLOCATE(inner_points)
+     end if
+     if (allocated(models)) then
+         @:DEALLOCATE(models)
+     end if
      if (allocated(airfoil_grid_u)) then
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_ibm.fpp` around lines 1135 - 1143, s_finalize_ibm_module
currently deallocates sf and airfoil grids but misses freeing ghost_points,
inner_points and models allocated earlier; update s_finalize_ibm_module to
deallocate ghost_points, inner_points and models (matching allocations in
s_ibm_setup and s_initialize_ibm_module) and ensure any corresponding GPU data
is exited/released if present (mirror GPU_ENTER_DATA copyin for
ghost_points/inner_points). Locate the allocations for models (in
s_initialize_ibm_module) and ghost_points/inner_points (in s_ibm_setup) and add
corresponding DEALLOCATE calls and GPU exit/cleanup in s_finalize_ibm_module to
prevent host and device memory leaks.

🧹 Nitpick comments (3)

src/simulation/m_ib_patches.fpp (3)

929-931: Loop order in s_ib_3d_model is inconsistent with other 3D kernels and causes non-coalesced GPU writes.

All other 3D patch routines (s_ib_sphere, s_ib_cuboid, s_ib_cylinder) use k outermost / j middle / i innermost so that consecutive vector-lane threads write to contiguous ib_markers%sf(i,j,k) locations (Fortran column-major). This kernel has i outermost / k innermost, meaning adjacent vector lanes stride across the (i,j) plane.

♻️ Proposed fix — reverse loop order to match other 3D kernels

-        do i = il, ir
-            do j = jl, jr
-                do k = kl, kr
+        do k = kl, kr
+            do j = jl, jr
+                do i = il, ir

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_ib_patches.fpp` around lines 929 - 931, The s_ib_3d_model
kernel currently iterates do i, do j, do k which is inconsistent with other 3D
patch routines and causes non-coalesced GPU writes to ib_markers%sf(i,j,k);
change the loop order to match s_ib_sphere / s_ib_cuboid / s_ib_cylinder by
making k the outermost loop, j the middle, and i the innermost (i.e., iterate k
then j then i) so adjacent vector-lane threads write contiguous memory in
Fortran column-major layout.

814-829: s_ib_ellipse is the only 2D patch still scanning the full grid — add get_bounding_indices narrowing.

Every other 2D/3D primitive (s_ib_circle, s_ib_rectangle, s_ib_airfoil, etc.) was updated to call get_bounding_indices in this PR, but s_ib_ellipse still iterates over the full ghost-extended domain. For the 1000-particle use case this is a clear outlier.

♻️ Proposed narrowing for s_ib_ellipse

+        integer :: il, ir, jl, jr
+        real(wp) :: corner_distance
         ! Transferring the ellipse's centroid and length information
         center(1) = patch_ib(patch_id)%x_centroid
         center(2) = patch_ib(patch_id)%y_centroid
         ellipse_coeffs(1) = 0.5_wp*patch_ib(patch_id)%length_x
         ellipse_coeffs(2) = 0.5_wp*patch_ib(patch_id)%length_y
         inverse_rotation(:, :) = patch_ib(patch_id)%rotation_matrix_inverse(:, :)

+        il = -gp_layers; ir = m + gp_layers
+        jl = -gp_layers; jr = n + gp_layers
+        corner_distance = sqrt(dot_product(ellipse_coeffs, ellipse_coeffs))
+        call get_bounding_indices(center(1) - corner_distance, center(1) + corner_distance, x_cc, il, ir)
+        call get_bounding_indices(center(2) - corner_distance, center(2) + corner_distance, y_cc, jl, jr)
+
         $:GPU_PARALLEL_LOOP(private='[i,j, xy_local]',&
                   & copyin='[patch_id,center,ellipse_coeffs,inverse_rotation,x_cc,y_cc]', collapse=2)
-        do j = -gp_layers, n + gp_layers
-            do i = -gp_layers, m + gp_layers
+        do j = jl, jr
+            do i = il, ir

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_ib_patches.fpp` around lines 814 - 829, s_ib_ellipse
currently scans the full ghost-extended grid inside the GPU_PARALLEL_LOOP;
update it to compute a tight loop range using the existing get_bounding_indices
helper (like other primitives) and only iterate i,j over that bounding box
(clamped by the ghost layers gp_layers and domain extents m,n). Specifically,
before the double loop, call get_bounding_indices with the ellipse center and
axes (center, ellipse_coeffs, inverse_rotation) to produce
min_i,max_i,min_j,max_j, then replace the do j/from -gp_layers,n+gp_layers and
do i/from -gp_layers,m+gp_layers ranges with the returned bounded ranges; keep
the same body that computes xy_local (using x_cc,y_cc) and sets
ib_markers%sf(i,j,0)=patch_id when inside the ellipse so behavior and data
references remain unchanged.

62-114: GPU checklist for src/simulation/ changes not confirmed complete.

Per PR objectives the GPU checklist items ("GPU results matching CPU" and "GPU vendor coverage") were listed but not checked. This PR introduces two new GPU kernels (s_ib_model, s_ib_3d_model) and the get_bounding_indices refactor across all IB primitives. Based on learnings: "For GPU code changes in src/simulation/, complete the GPU checklist in the PR template."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_ib_patches.fpp` around lines 62 - 114, The PR omits
completing the GPU checklist for changes that add GPU kernels—specifically
s_ib_model and s_ib_3d_model—and the get_bounding_indices refactor; run and
record GPU validation by (1) building and running the GPU test suite that covers
all IB primitives including s_ib_model and s_ib_3d_model, (2) comparing GPU
results to CPU baselines (numerical diffs/plots) and attaching the logs, and (3)
confirming vendor coverage by running the same GPU tests on available vendors
(e.g., NVIDIA, AMD, Intel) or noting which vendors were tested and any failures;
update the PR template checklist with pass/fail status, attach the test
commands, output files, and any small fixes needed to make get_bounding_indices
and the new kernels GPU-compatible.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/common/m_model.fpp`:
- Line 576: The expression computing fraction uses default real kind; update the
conversion to enforce working precision by replacing real(nInOrOut)/real(spc)
with real(nInOrOut, wp)/real(spc, wp) and apply the same change in the
f_model_is_inside_flat occurrence (line referencing spc and nInOrOut) so both
numerator and denominator use the wp kind consistently.
- Around line 541-543: The hash constants are written with the `_wp` suffix
(e.g. 73856093_wp) which creates real literals that will be rounded in
single-precision builds; change them to use explicit conversion from integer to
the working kind, e.g. replace occurrences like int(point(1)*73856093_wp) with
int(point(1)*real(73856093, wp)) (or equivalently real(73856093, wp) wrapped as
needed) in the rand_seed computation in m_model (the lines setting rand_seed)
and make the identical change in f_model_is_inside_flat (the three constants on
lines ~599–601) so the integer constants are converted explicitly to kind wp
instead of being written as real literals.

In `@src/simulation/m_compute_levelset.fpp`:
- Line 694: The conditional uses an integer "interpolate" as a logical; change
the check at the if statement (currently "if (interpolate)") to an explicit
integer comparison "if (interpolate == 1)" to match the other usage in this
routine and standard Fortran; update the same conditional in routine
m_compute_levelset (variable interpolate) so all branches use "interpolate == 1"
consistently.

In `@src/simulation/m_ib_patches.fpp`:
- Around line 941-949: s_ib_3d_model is dereferencing
patch_ib(patch_id)%model_threshold inside the GPU kernel even though you already
copied the scalar threshold into the kernel; change the kernel check to use the
local scalar threshold (the scalar set at line ~910 and listed in copyin)
instead of accessing patch_ib(patch_id)%model_threshold so the kernel uses the
device-local scalar and avoids a possible host-to-device runtime fault — update
the conditional around the call to f_model_is_inside_flat and the assignment to
ib_markers%sf(i,j,k) to compare eta against threshold rather than patch_ib(...).
- Around line 428-430: The z-direction bounding call to get_bounding_indices
currently reuses ca_in and thus can exclude wing-tip cells; change the third
call so the z bounds use the correct half-span (use max(ca_in, lz/2) or at least
lz/2) instead of ca_in by passing center(3) - z_bound and center(3) + z_bound
with z_bound = max(ca_in, lz/2); update the call to
get_bounding_indices(center(3) - z_bound, center(3) + z_bound, z_cc, ll, lr) so
ll/lr cover the full airfoil span when marking IB cells.
- Around line 1257-1265: Add an explicit runtime check at the start of
subroutine get_bounding_indices to enforce the invariant that buff_size >=
gp_layers: validate the declared dummy array dimension(-buff_size:) against the
module/global gp_layers constant and, if the check fails, abort with a clear
error (e.g., error stop or process-level abort) so callers that set left_index =
-gp_layers cannot cause out-of-bounds access on cell_centers; place this check
immediately after the subroutine header and before using left_index/itr_left.

In `@src/simulation/m_ibm.fpp`:
- Around line 596-673: The GPU loop leaves ghost_points_in(local_idx)%DB(3)
uninitialized when p == 0; fix by explicitly initializing the third component of
the DB array for ghost points inside the is_gp branch: after setting DB(1) and
DB(2) (within the block guarded by is_gp) add logic that sets
ghost_points_in(local_idx)%DB(3) = 0 when p == 0 (or alternatively always
initialize DB(3) before the conditional on p), referencing the symbols
ghost_points_in, local_idx, p and DB so the change is placed in the same
GPU_PARALLEL_LOOP where is_gp is determined.

In `@src/simulation/m_time_steppers.fpp`:
- Line 806: The NVTX range name contains a typo: update the nvtxStartRange call
that currently uses "PROPAGATE-IMERSED-BOUNDARIES" (the call
nvtxStartRange("PROPAGATE-IMERSED-BOUNDARIES")) to the correct label
"PROPAGATE-IMMERSED-BOUNDARIES" so profiling tools show the expected name.

---

Outside diff comments:
In `@src/simulation/m_ibm.fpp`:
- Around line 457-530: The GPU kernel macro invocation lists s_cc as private but
s_cc is a pointer reassigned (via =>) to x_cc, y_cc, or z_cc inside the loop;
change the kernel parameter handling to use deviceptr for this pointer or
eliminate the pointer reassignment. Specifically, update the GPU region (the
$:GPU_PARALLEL_LOOP(...) / $:END_GPU_PARALLEL_LOOP block) so that s_cc is passed
as deviceptr='[s_cc]' instead of appearing in the private list, or refactor the
inner loop in the do dim loop to index x_cc, y_cc, z_cc directly based on dim
(removing s_cc and its pointer reassignments) while keeping the rest of the
logic that computes ip_loc, ip_grid, norm, and bound intact (references: s_cc,
x_cc, y_cc, z_cc, ghost_points_in, GPU_PARALLEL_LOOP).
- Around line 1135-1143: s_finalize_ibm_module currently deallocates sf and
airfoil grids but misses freeing ghost_points, inner_points and models allocated
earlier; update s_finalize_ibm_module to deallocate ghost_points, inner_points
and models (matching allocations in s_ibm_setup and s_initialize_ibm_module) and
ensure any corresponding GPU data is exited/released if present (mirror
GPU_ENTER_DATA copyin for ghost_points/inner_points). Locate the allocations for
models (in s_initialize_ibm_module) and ghost_points/inner_points (in
s_ibm_setup) and add corresponding DEALLOCATE calls and GPU exit/cleanup in
s_finalize_ibm_module to prevent host and device memory leaks.

---

Duplicate comments:
In `@src/common/m_model.fpp`:
- Around line 1394-1407: The subroutine s_pack_model_for_gpu unconditionally
allocates ma%trs_v and ma%trs_n which will crash if called again on the same
object; update s_pack_model_for_gpu to check allocated(ma%trs_v) and
allocated(ma%trs_n) and deallocate them (or use move_alloc/resizing strategy)
before allocating new arrays, ensuring ma%ntrs is assigned after any
deallocation and that allocations match ma%model%ntrs; reference ma%trs_v,
ma%trs_n, ma%ntrs, and s_pack_model_for_gpu when making the change.
- Around line 21-42: The public GPU-flat arrays (gpu_ntrs, gpu_trs_v, gpu_trs_n,
gpu_boundary_v, gpu_interpolated_boundary_v, gpu_interpolate,
gpu_boundary_edge_count, gpu_total_vertices, stl_bounding_boxes) and the
models(t_model_array) are not being deallocated; update the module finalization
path by implementing/augmenting s_model_free to safely deallocate each
allocatable (using allocated() checks) and nullify targets as needed, and ensure
s_model_free is called on teardown (or register a module final subroutine) so
all GPU/host memory is released.
- Around line 1203-1261: The subroutine f_distance_normals_3D_flat can leave
tri_idx at 0 when ntrs == 0, causing invalid access to trs_n(:,tri_idx,pid); add
an early guard at the top of the routine: if ntrs == 0 then set normals(1:3) =
0._wp (or another agreed sentinel) and distance = 1.e12_wp (matching dist_min)
and return; this prevents the later accesses to trs_n(1,tri_idx,pid),
trs_n(2,tri_idx,pid), trs_n(3,tri_idx,pid) from using tri_idx == 0.

In `@src/simulation/m_ib_patches.fpp`:
- Around line 1120-1143: The module-level gpu_* arrays (gpu_ntrs, gpu_trs_v,
gpu_trs_n, gpu_interpolate, gpu_boundary_edge_count, gpu_total_vertices,
gpu_boundary_v, gpu_interpolated_boundary_v) are being allocated in
s_instantiate_STL_models without allocation guards, causing double-allocation on
restart; modify s_instantiate_STL_models to check allocation status before
allocating (e.g., if (.not. allocated(gpu_ntrs)) then allocate(...) ) or
explicitly deallocate before re-allocating, applying this guard to every
allocate call for the listed symbols and for the conditional allocations of
gpu_boundary_v and gpu_interpolated_boundary_v so repeated calls no longer
error.
- Around line 857-884: The kernel currently does a full-grid scan (loops over
i=-gp_layers..m+gp_layers and j=-gp_layers..n+gp_layers) instead of using
get_bounding_indices like other STL paths, and it omits gpu arrays from the GPU
data clauses; fix by computing narrowed loop bounds via get_bounding_indices for
the patch before the GPU_PARALLEL_LOOP and iterate only over those indices
(replace the full-range i/j loops with the bounding ranges), and add gpu_ntrs,
gpu_trs_v, and gpu_trs_n to the GPU data clause (copyin/present) of the
GPU_PARALLEL_LOOP so f_model_is_inside_flat can safely access them while keeping
ib_markers%sf writes as before.
- Around line 1074-1090: The code allocates stl_bounding_boxes inside the
per-patch loop causing reallocation errors and conditionally sets
models(patch_id)%total_vertices only when interpolate is true even though
interpolate is forced false earlier; also interpolation calls pass full dx/dy/dz
arrays instead of scalar local spacings. Fix by moving the
allocate(stl_bounding_boxes(...)) so it happens once before the patch_id loop
(or guard it with allocated(...) check) to avoid reallocating, ensure
models(patch_id)%total_vertices is assigned unconditionally (or set
models(patch_id)%total_vertices = 0 when not interpolating) and only set
models(patch_id)%interpolated_boundary_v when interpolate is true, and change
calls that currently pass (/dx, dy, dz/) to pass the scalar local spacings
(/dx_local, dy_local, dz_local/) (look for symbols stl_bounding_boxes, models(%
)%total_vertices, interpolate, interpolated_boundary_v, and the interpolation
routine call sites using dx/dy/dz).

---

Nitpick comments:
In `@src/simulation/m_ib_patches.fpp`:
- Around line 929-931: The s_ib_3d_model kernel currently iterates do i, do j,
do k which is inconsistent with other 3D patch routines and causes non-coalesced
GPU writes to ib_markers%sf(i,j,k); change the loop order to match s_ib_sphere /
s_ib_cuboid / s_ib_cylinder by making k the outermost loop, j the middle, and i
the innermost (i.e., iterate k then j then i) so adjacent vector-lane threads
write contiguous memory in Fortran column-major layout.
- Around line 814-829: s_ib_ellipse currently scans the full ghost-extended grid
inside the GPU_PARALLEL_LOOP; update it to compute a tight loop range using the
existing get_bounding_indices helper (like other primitives) and only iterate
i,j over that bounding box (clamped by the ghost layers gp_layers and domain
extents m,n). Specifically, before the double loop, call get_bounding_indices
with the ellipse center and axes (center, ellipse_coeffs, inverse_rotation) to
produce min_i,max_i,min_j,max_j, then replace the do j/from
-gp_layers,n+gp_layers and do i/from -gp_layers,m+gp_layers ranges with the
returned bounded ranges; keep the same body that computes xy_local (using
x_cc,y_cc) and sets ib_markers%sf(i,j,0)=patch_id when inside the ellipse so
behavior and data references remain unchanged.
- Around line 62-114: The PR omits completing the GPU checklist for changes that
add GPU kernels—specifically s_ib_model and s_ib_3d_model—and the
get_bounding_indices refactor; run and record GPU validation by (1) building and
running the GPU test suite that covers all IB primitives including s_ib_model
and s_ib_3d_model, (2) comparing GPU results to CPU baselines (numerical
diffs/plots) and attaching the logs, and (3) confirming vendor coverage by
running the same GPU tests on available vendors (e.g., NVIDIA, AMD, Intel) or
noting which vendors were tested and any failures; update the PR template
checklist with pass/fail status, attach the test commands, output files, and any
small fixes needed to make get_bounding_indices and the new kernels
GPU-compatible.

coderabbitai · 2026-02-20T04:21:35Z

src/common/m_model.fpp

+        rand_seed = int(point(1)*73856093_wp) + &
+                    int(point(2)*19349663_wp) + &
+                    int(point(3)*83492791_wp)


⚠️ Potential issue | 🟠 Major

Lint failure: _wp suffix on integer hash constants loses precision in single-precision builds.

The pipeline flags 73856093_wp, 19349663_wp, 83492791_wp. Using _wp makes these real literals, but these primes exceed the single-precision mantissa (~8.4M), so they'll be silently truncated when wp is single precision. Use explicit conversion instead.

Proposed fix

- rand_seed = int(point(1)*73856093_wp) + & - int(point(2)*19349663_wp) + & - int(point(3)*83492791_wp) + rand_seed = int(point(1)*real(73856093, wp)) + & + int(point(2)*real(19349663, wp)) + & + int(point(3)*real(83492791, wp))

Apply the same fix to lines 599–601 in f_model_is_inside_flat.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

rand_seed = int(point(1)*73856093_wp) + &

int(point(2)*19349663_wp) + &

int(point(3)*83492791_wp)

rand_seed = int(point(1)*real(73856093, wp)) + &

int(point(2)*real(19349663, wp)) + &

int(point(3)*real(83492791, wp))

🧰 Tools

🪛 GitHub Actions: Lint Source

[error] 541-541: Pattern match found: '73856093_wp'

[error] 542-542: Pattern match found: '19349663_wp'

[error] 543-543: Pattern match found: '83492791_wp'

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/common/m_model.fpp` around lines 541 - 543, The hash constants are written with the `_wp` suffix (e.g. 73856093_wp) which creates real literals that will be rounded in single-precision builds; change them to use explicit conversion from integer to the working kind, e.g. replace occurrences like int(point(1)*73856093_wp) with int(point(1)*real(73856093, wp)) (or equivalently real(73856093, wp) wrapped as needed) in the rand_seed computation in m_model (the lines setting rand_seed) and make the identical change in f_model_is_inside_flat (the three constants on lines ~599–601) so the integer constants are converted explicitly to kind wp instead of being written as real literals.

coderabbitai · 2026-02-20T04:21:35Z

src/common/m_model.fpp

            nInOrOut = nInOrOut + mod(nHits, 2)
        end do

        fraction = real(nInOrOut)/real(spc)


⚠️ Potential issue | 🟡 Minor

real() without kind parameter — use real(..., wp) for precision consistency.

real(nInOrOut)/real(spc) uses default real kind, which may differ from wp. Same issue at line 629 in f_model_is_inside_flat. As per coding guidelines: avoid implicit precision; use real(nInOrOut, wp)/real(spc, wp).

Proposed fix (line 576)

- fraction = real(nInOrOut)/real(spc) + fraction = real(nInOrOut, wp)/real(spc, wp)

Proposed fix (line 629)

- fraction = real(nInOrOut)/real(spc) + fraction = real(nInOrOut, wp)/real(spc, wp)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

fraction = real(nInOrOut)/real(spc)

fraction = real(nInOrOut, wp)/real(spc, wp)

Suggested change

fraction = real(nInOrOut)/real(spc)

fraction = real(nInOrOut, wp)/real(spc, wp)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/common/m_model.fpp` at line 576, The expression computing fraction uses default real kind; update the conversion to enforce working precision by replacing real(nInOrOut)/real(spc) with real(nInOrOut, wp)/real(spc, wp) and apply the same change in the f_model_is_inside_flat occurrence (line referencing spc and nInOrOut) so both numerator and denominator use the wp kind consistently.

coderabbitai · 2026-02-20T04:21:35Z

src/simulation/m_compute_levelset.fpp

+            call f_distance_normals_3D_flat(gpu_ntrs(patch_id), gpu_trs_v, gpu_trs_n, patch_id, xyz_local, normals, distance)

            ! Get the shortest distance between the cell center and the interpolated model boundary
            if (interpolate) then


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, examine the relevant lines in m_compute_levelset.fpp cat -n src/simulation/m_compute_levelset.fpp | sed -n '645,660p'

Repository: MFlowCode/MFC

Length of output: 677

🏁 Script executed:

# Check lines 690-710 to see both occurrences cat -n src/simulation/m_compute_levelset.fpp | sed -n '690,720p'

Repository: MFlowCode/MFC

Length of output: 1784

🏁 Script executed:

# Search for similar integer-as-logical patterns in the codebase rg -nP 'if\s*$\s*(?!.*==|.*\.or\.|.*\.and\.).*(?:interpolate|gpu_interpolate|gpu_ntrs|gpu_boundary_edge_count)\s*$' src/simulation/ src/common/ --max-count=30

Repository: MFlowCode/MFC

Length of output: 357

🏁 Script executed:

# Check the context of similar patterns in m_ib_patches.fpp cat -n src/simulation/m_ib_patches.fpp | sed -n '1035,1045p'

Repository: MFlowCode/MFC

Length of output: 663

🏁 Script executed:

cat -n src/simulation/m_ib_patches.fpp | sed -n '1080,1095p'

Repository: MFlowCode/MFC

Length of output: 804

🏁 Script executed:

# Also verify if interpolate is declared as integer in m_ib_patches.fpp rg -n 'integer.*interpolate|integer :: .*interpolate' src/simulation/m_ib_patches.fpp | head -5

Repository: MFlowCode/MFC

Length of output: 39

🏁 Script executed:

# Search for interpolate declaration in m_ib_patches.fpp context rg -B5 -A5 'if (interpolate)' src/simulation/m_ib_patches.fpp | head -40

Repository: MFlowCode/MFC

Length of output: 39

🏁 Script executed:

# Check the subroutine signature and local declarations in m_ib_patches.fpp rg -n 'subroutine.*m_ib_patches|logical.*interpolate|integer.*interpolate' src/simulation/m_ib_patches.fpp | head -20

Repository: MFlowCode/MFC

Length of output: 172

Integer variable used as logical condition; change if (interpolate) to if (interpolate == 1).

The variable interpolate is declared as integer (line 650), but line 694 treats it as a logical condition, which is non-standard Fortran. Line 707 in the same file correctly uses if (interpolate == 1), creating an inconsistency within the same routine.

Proposed fix

- if (interpolate) then + if (interpolate == 1) then

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (interpolate) then

if (interpolate == 1) then

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/simulation/m_compute_levelset.fpp` at line 694, The conditional uses an integer "interpolate" as a logical; change the check at the if statement (currently "if (interpolate)") to an explicit integer comparison "if (interpolate == 1)" to match the other usage in this routine and standard Fortran; update the same conditional in routine m_compute_levelset (variable interpolate) so all branches use "interpolate == 1" consistently.

coderabbitai · 2026-02-20T04:21:36Z

src/simulation/m_ib_patches.fpp

+        call get_bounding_indices(center(1) - ca_in, center(1) + ca_in, x_cc, il, ir)
+        call get_bounding_indices(center(2) - ca_in, center(2) + ca_in, y_cc, jl, jr)
+        call get_bounding_indices(center(3) - ca_in, center(3) + ca_in, z_cc, ll, lr)


⚠️ Potential issue | 🔴 Critical

Critical: z-direction bounding uses chord ca_in instead of half-span lz/2, silently excluding wing-tip cells.

Lines 428–429 correctly use ca_in for the x/y (chord-section) bounds, but line 430 reuses ca_in for z even though the airfoil spans lz in that direction (z_max = lz/2, z_min = -lz/2, lines 355–356). For any realistic wing where lz > 2*ca_in, the loop over l never reaches cells with |z_cc(l) - center(3)| > ca_in, so those cells are never tested against the airfoil geometry and remain unmarked — silently corrupting the IB marker field.

🐛 Proposed fix

- call get_bounding_indices(center(3) - ca_in, center(3) + ca_in, z_cc, ll, lr) + call get_bounding_indices(center(3) - lz/2._wp, center(3) + lz/2._wp, z_cc, ll, lr)

For a rotated airfoil the tightest safe bound is max(ca_in, lz/2) per axis; the above is correct for the unrotated (common) case and consistent with how z_min/z_max are defined.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

call get_bounding_indices(center(1) - ca_in, center(1) + ca_in, x_cc, il, ir)

call get_bounding_indices(center(2) - ca_in, center(2) + ca_in, y_cc, jl, jr)

call get_bounding_indices(center(3) - ca_in, center(3) + ca_in, z_cc, ll, lr)

call get_bounding_indices(center(1) - ca_in, center(1) + ca_in, x_cc, il, ir)

call get_bounding_indices(center(2) - ca_in, center(2) + ca_in, y_cc, jl, jr)

call get_bounding_indices(center(3) - lz/2._wp, center(3) + lz/2._wp, z_cc, ll, lr)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/simulation/m_ib_patches.fpp` around lines 428 - 430, The z-direction bounding call to get_bounding_indices currently reuses ca_in and thus can exclude wing-tip cells; change the third call so the z bounds use the correct half-span (use max(ca_in, lz/2) or at least lz/2) instead of ca_in by passing center(3) - z_bound and center(3) + z_bound with z_bound = max(ca_in, lz/2); update the call to get_bounding_indices(center(3) - z_bound, center(3) + z_bound, z_cc, ll, lr) so ll/lr cover the full airfoil span when marking IB cells.

coderabbitai · 2026-02-20T04:21:36Z

src/simulation/m_ib_patches.fpp

+                    eta = f_model_is_inside_flat(gpu_ntrs(patch_id), &
+                                                 gpu_trs_v, gpu_trs_n, &
+                                                 patch_id, &
+                                                 xyz_local, (/dx(i), dy(j), dz(k)/), &
+                                                 spc)

                    ! Reading STL boundary vertices and compute the levelset and levelset_norm
                    if (eta > patch_ib(patch_id)%model_threshold) then
-                        ib_markers_sf(i, j, k) = patch_id
+                        ib_markers%sf(i, j, k) = patch_id


⚠️ Potential issue | 🟡 Minor

s_ib_3d_model reads patch_ib(patch_id)%model_threshold inside the GPU kernel instead of the scalar threshold already copied in.

Line 910 captures threshold = patch_ib(patch_id)%model_threshold and line 928 lists it in copyin, yet line 948 discards it and dereferences the derived-type field again inside the kernel. s_ib_model (line 877) uses the local scalar correctly. Beyond inconsistency, if patch_ib is not independently present on the device this is a runtime fault.

🐛 Proposed fix

- if (eta > patch_ib(patch_id)%model_threshold) then + if (eta > threshold) then

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/simulation/m_ib_patches.fpp` around lines 941 - 949, s_ib_3d_model is dereferencing patch_ib(patch_id)%model_threshold inside the GPU kernel even though you already copied the scalar threshold into the kernel; change the kernel check to use the local scalar threshold (the scalar set at line ~910 and listed in copyin) instead of accessing patch_ib(patch_id)%model_threshold so the kernel uses the device-local scalar and avoids a possible host-to-device runtime fault — update the conditional around the call to f_model_is_inside_flat and the assignment to ib_markers%sf(i,j,k) to compare eta against threshold rather than patch_ib(...).

coderabbitai · 2026-02-20T04:21:36Z

src/simulation/m_ib_patches.fpp

+    subroutine get_bounding_indices(left_bound, right_bound, cell_centers, left_index, right_index)
+
+        real(wp), intent(in) :: left_bound, right_bound
+        integer, intent(inout) :: left_index, right_index
+        real(wp), dimension(-buff_size:), intent(in) :: cell_centers
+
+        integer :: itr_left, itr_middle, itr_right
+
+        itr_left = left_index


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify buff_size and gp_layers are always equal in simulation code echo "=== buff_size definition ===" rg -n 'buff_size\s*=' src/simulation/ src/common/ --type fpp -B1 -A1 | head -30 echo "=== gp_layers definition/assignment ===" rg -n 'gp_layers\s*=' src/simulation/ src/common/ --type fpp -B1 -A1 | head -30 echo "=== get_bounding_indices call sites ===" rg -n 'call get_bounding_indices' src/simulation/ --type fpp -B2 -A1 | head -40 echo "=== cell_centers actual array declarations (x_cc, y_cc, z_cc) ===" rg -nP 'x_cc|y_cc|z_cc' src/simulation/m_global_parameters.fpp src/common/m_global_parameters.fpp 2>/dev/null | grep -i 'allocat\|dimension' | head -20

Repository: MFlowCode/MFC

Length of output: 970

🏁 Script executed:

#!/bin/bash # Search without --type flag echo "=== buff_size definition/use ===" rg -n 'buff_size' src/simulation/m_ib_patches.fpp src/simulation/m_global_parameters.fpp | head -40 echo "=== gp_layers definition/use ===" rg -n 'gp_layers' src/simulation/m_ib_patches.fpp src/simulation/m_global_parameters.fpp | head -40 echo "=== Check m_ib_patches imports and parameter section ===" head -100 src/simulation/m_ib_patches.fpp | grep -A 20 'subroutine\|parameter\|buff_size\|gp_layers' echo "=== get_bounding_indices all calls with context ===" rg -n 'get_bounding_indices' src/simulation/ -B3 -A1 | head -80

Repository: MFlowCode/MFC

Length of output: 12541

🏁 Script executed:

#!/bin/bash # Find gp_layers definition/declaration echo "=== gp_layers parameter/variable declaration ===" rg -n 'integer.*gp_layers|gp_layers.*integer' src/simulation/ src/common/ -B2 -A2 echo "=== Search for parameter declarations in m_ib_patches ===" head -200 src/simulation/m_ib_patches.fpp | grep -i 'parameter\|gp_layers' echo "=== Search for gp_layers in m_global_parameters ===" rg -n 'gp_layers' src/simulation/m_global_parameters.fpp | head -20 echo "=== Check if buff_size and gp_layers are related ===" rg -n 'buff_size|gp_layers' src/simulation/m_global_parameters.fpp | head -50

Repository: MFlowCode/MFC

Length of output: 2975

🏁 Script executed:

#!/bin/bash # Find where buff_size is assigned echo "=== buff_size assignments ===" rg -n 'buff_size\s*=' src/ -B2 -A2 | head -80 echo "=== Check if buff_size is read from input or derived ===" rg -n 'buff_size' src/simulation/m_global_parameters.fpp | grep -E 'read|input|set|=|parameter' echo "=== Full context of buff_size variable declaration and related code ===" sed -n '335,360p' src/simulation/m_global_parameters.fpp

Repository: MFlowCode/MFC

Length of output: 3661

🏁 Script executed:

#!/bin/bash # Verify there's no explicit documentation or assertion enforcing buff_size >= gp_layers echo "=== Check for assertions or guards on buff_size >= gp_layers ===" rg -n 'buff_size.*gp_layers|gp_layers.*buff_size' src/ --type-not test echo "=== Check minimum values in buff_size calculation path ===" sed -n '125,155p' src/common/m_helper_basic.fpp echo "=== Verify IBM flag triggers buff_size >= 10 ===" rg -n 'if.*ib.*then' src/common/m_helper_basic.fpp -A3 | head -20

Repository: MFlowCode/MFC

Length of output: 1463

Add an assertion to enforce buff_size ≥ gp_layers or document the implicit invariant.

The dummy array cell_centers is declared dimension(-buff_size:) (line 1261), and all callers initialize left_index = −gp_layers (−3). While the code is currently safe because get_bounding_indices is only called from IBM routines where buff_size ≥ 10, there is no explicit guard preventing out-of-bounds access if the invariant is violated. Consider adding an assertion at the subroutine entry point:

if (buff_size < gp_layers) then error stop 'get_bounding_indices: buff_size must be >= gp_layers' end if

This makes the coupling between buff_size and gp_layers explicit and prevents latent bugs if the code is refactored or called from different contexts in the future.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/simulation/m_ib_patches.fpp` around lines 1257 - 1265, Add an explicit runtime check at the start of subroutine get_bounding_indices to enforce the invariant that buff_size >= gp_layers: validate the declared dummy array dimension(-buff_size:) against the module/global gp_layers constant and, if the check fails, abort with a clear error (e.g., error stop or process-level abort) so callers that set left_index = -gp_layers cannot cause out-of-bounds access on cell_centers; place this check immediately after the subroutine header and before using left_index/itr_left.

coderabbitai · 2026-02-20T04:21:36Z

src/simulation/m_ibm.fpp

+        count = 0
+        count_i = 0
+        gp_layers_z = gp_layers
+        if (p == 0) gp_layers_z = 0

+        $:GPU_PARALLEL_LOOP(private='[i,j,k,ii,jj,kk,is_gp,local_idx]', copyin='[count,count_i, x_domain, y_domain, z_domain]', firstprivate='[gp_layers,gp_layers_z]', collapse=3)
        do i = 0, m
            do j = 0, n
-                if (p == 0) then
-                    ! 2D
-                    if (ib_markers%sf(i, j, 0) /= 0) then
-                        subsection_2D = ib_markers%sf( &
-                                        i - gp_layers:i + gp_layers, &
-                                        j - gp_layers:j + gp_layers, 0)
-                        if (any(subsection_2D == 0)) then
-                            ghost_points_in(count)%loc = [i, j, 0]
-                            patch_id = ib_markers%sf(i, j, 0)
-                            ghost_points_in(count)%ib_patch_id = &
-                                patch_id
-
-                            ghost_points_in(count)%slip = patch_ib(patch_id)%slip
-                            ! ghost_points(count)%rank = proc_rank
+                do k = 0, p
+                    if (ib_markers%sf(i, j, k) /= 0) then
+                        is_gp = .false.
+                        marker_search: do ii = i - gp_layers, i + gp_layers
+                            do jj = j - gp_layers, j + gp_layers
+                                do kk = k - gp_layers_z, k + gp_layers_z
+                                    if (ib_markers%sf(ii, jj, kk) == 0) then
+                                        ! if any neighbors are not in the IB, it is a ghost point
+                                        is_gp = .true.
+                                        exit marker_search
+                                    end if
+                                end do
+                            end do
+                        end do marker_search
+
+                        if (is_gp) then
+                            $:GPU_ATOMIC(atomic='capture')
+                            count = count + 1
+                            local_idx = count
+                            $:END_GPU_ATOMIC_CAPTURE()
+
+                            ghost_points_in(local_idx)%loc = [i, j, k]
+                            patch_id = ib_markers%sf(i, j, k)
+                            ghost_points_in(local_idx)%ib_patch_id = patch_id
+
+                            ghost_points_in(local_idx)%slip = patch_ib(patch_id)%slip

                            if ((x_cc(i) - dx(i)) < x_domain%beg) then
-                                ghost_points_in(count)%DB(1) = -1
+                                ghost_points_in(local_idx)%DB(1) = -1
                            else if ((x_cc(i) + dx(i)) > x_domain%end) then
-                                ghost_points_in(count)%DB(1) = 1
+                                ghost_points_in(local_idx)%DB(1) = 1
                            else
-                                ghost_points_in(count)%DB(1) = 0
+                                ghost_points_in(local_idx)%DB(1) = 0
                            end if

                            if ((y_cc(j) - dy(j)) < y_domain%beg) then
-                                ghost_points_in(count)%DB(2) = -1
+                                ghost_points_in(local_idx)%DB(2) = -1
                            else if ((y_cc(j) + dy(j)) > y_domain%end) then
-                                ghost_points_in(count)%DB(2) = 1
+                                ghost_points_in(local_idx)%DB(2) = 1
                            else
-                                ghost_points_in(count)%DB(2) = 0
+                                ghost_points_in(local_idx)%DB(2) = 0
                            end if

-                            count = count + 1
-
-                        else
-                            inner_points_in(count_i)%loc = [i, j, 0]
-                            patch_id = ib_markers%sf(i, j, 0)
-                            inner_points_in(count_i)%ib_patch_id = &
-                                patch_id
-                            inner_points_in(count_i)%slip = patch_ib(patch_id)%slip
-                            count_i = count_i + 1
-
-                        end if
-                    end if
-                else
-                    ! 3D
-                    do k = 0, p
-                        if (ib_markers%sf(i, j, k) /= 0) then
-                            subsection_3D = ib_markers%sf( &
-                                            i - gp_layers:i + gp_layers, &
-                                            j - gp_layers:j + gp_layers, &
-                                            k - gp_layers:k + gp_layers)
-                            if (any(subsection_3D == 0)) then
-                                ghost_points_in(count)%loc = [i, j, k]
-                                patch_id = ib_markers%sf(i, j, k)
-                                ghost_points_in(count)%ib_patch_id = &
-                                    ib_markers%sf(i, j, k)
-                                ghost_points_in(count)%slip = patch_ib(patch_id)%slip
-
-                                if ((x_cc(i) - dx(i)) < x_domain%beg) then
-                                    ghost_points_in(count)%DB(1) = -1
-                                else if ((x_cc(i) + dx(i)) > x_domain%end) then
-                                    ghost_points_in(count)%DB(1) = 1
-                                else
-                                    ghost_points_in(count)%DB(1) = 0
-                                end if
-
-                                if ((y_cc(j) - dy(j)) < y_domain%beg) then
-                                    ghost_points_in(count)%DB(2) = -1
-                                else if ((y_cc(j) + dy(j)) > y_domain%end) then
-                                    ghost_points_in(count)%DB(2) = 1
-                                else
-                                    ghost_points_in(count)%DB(2) = 0
-                                end if
-
+                            if (p /= 0) then
                                if ((z_cc(k) - dz(k)) < z_domain%beg) then
-                                    ghost_points_in(count)%DB(3) = -1
+                                    ghost_points_in(local_idx)%DB(3) = -1
                                else if ((z_cc(k) + dz(k)) > z_domain%end) then
-                                    ghost_points_in(count)%DB(3) = 1
+                                    ghost_points_in(local_idx)%DB(3) = 1
                                else
-                                    ghost_points_in(count)%DB(3) = 0
+                                    ghost_points_in(local_idx)%DB(3) = 0
                                end if
+                            end if

-                                count = count + 1
-                            else
-                                inner_points_in(count_i)%loc = [i, j, k]
-                                patch_id = ib_markers%sf(i, j, k)
-                                inner_points_in(count_i)%ib_patch_id = &
-                                    ib_markers%sf(i, j, k)
-                                inner_points_in(count_i)%slip = patch_ib(patch_id)%slip
+                        else
+                            $:GPU_ATOMIC(atomic='capture')
+                            count_i = count_i + 1
+                            local_idx = count_i
+                            $:END_GPU_ATOMIC_CAPTURE()
+
+                            inner_points_in(local_idx)%loc = [i, j, k]
+                            patch_id = ib_markers%sf(i, j, k)
+                            inner_points_in(local_idx)%ib_patch_id = patch_id
+                            inner_points_in(local_idx)%slip = patch_ib(patch_id)%slip

-                                count_i = count_i + 1
-                            end if
                        end if
-                    end do
-                end if
+                    end if
+                end do
            end do
        end do
+        $:END_GPU_PARALLEL_LOOP()


⚠️ Potential issue | 🟡 Minor

ghost_points_in(local_idx)%DB(3) uninitialized when p == 0.

In the 2D case (p == 0), the if (p /= 0) block (lines 647–655) is skipped, leaving DB(3) uninitialized for ghost points. While DB(3) may not be used in the 2D path of s_compute_image_points (since num_dims limits the loop), uninitialized data in a GPU-resident struct can cause subtle issues if any downstream code reads it.

Proposed fix

+ ghost_points_in(local_idx)%DB(3) = 0 if (p /= 0) then if ((z_cc(k) - dz(k)) < z_domain%beg) then ghost_points_in(local_idx)%DB(3) = -1

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/simulation/m_ibm.fpp` around lines 596 - 673, The GPU loop leaves ghost_points_in(local_idx)%DB(3) uninitialized when p == 0; fix by explicitly initializing the third component of the DB array for ghost points inside the is_gp branch: after setting DB(1) and DB(2) (within the block guarded by is_gp) add logic that sets ghost_points_in(local_idx)%DB(3) = 0 when p == 0 (or alternatively always initialize DB(3) before the conditional on p), referencing the symbols ghost_points_in, local_idx, p and DB so the change is placed in the same GPU_PARALLEL_LOOP where is_gp is determined.

coderabbitai · 2026-02-20T04:21:36Z

src/simulation/m_time_steppers.fpp

        integer :: i
        logical :: forces_computed

+        call nvtxStartRange("PROPAGATE-IMERSED-BOUNDARIES")


⚠️ Potential issue | 🟡 Minor

Typo in NVTX range name: "IMERSED" → "IMMERSED".

This label appears in profiling tools (e.g., Nsight Systems). The PR description images already show "PROPAGATE-IMMERSED-BOUNDARIES" as the expected label.

Proposed fix

- call nvtxStartRange("PROPAGATE-IMERSED-BOUNDARIES") + call nvtxStartRange("PROPAGATE-IMMERSED-BOUNDARIES")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

call nvtxStartRange("PROPAGATE-IMERSED-BOUNDARIES")

call nvtxStartRange("PROPAGATE-IMMERSED-BOUNDARIES")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/simulation/m_time_steppers.fpp` at line 806, The NVTX range name contains a typo: update the nvtxStartRange call that currently uses "PROPAGATE-IMERSED-BOUNDARIES" (the call nvtxStartRange("PROPAGATE-IMERSED-BOUNDARIES")) to the correct label "PROPAGATE-IMMERSED-BOUNDARIES" so profiling tools show the expected name.

danieljvickers added 7 commits February 16, 2026 12:19

Moved all of the IB marker calculation to the GPU without copy

0a304a2

Added profiling and increased maximum num IBs to 1000

e8c778e

Performance tuning complted for marker generation

5d885bb

Bindary search for IB index region beginning for reduced IB marker co…

85889c0

…mpute

ghost points are now computed on the GPU

f0085c9

image points computed on the GPU for x4 performance in that subroutine

bfcc593

Merge branch 'master' into gpu-optimizations

3c4b6dd

danieljvickers added 3 commits February 17, 2026 11:20

Need WAY more parameters in the case file... We should probably do so…

5201ee7

…mething about that...

Extended the binary search reduction to all 3D IB geometries

622edb0

Extended area reduction to all non-model IBs

0a089ce

danieljvickers added 6 commits February 18, 2026 11:45

Intermittent commit for GPU STLs

804a286

Ib markers computed on GPU working

64bc348

Passes STL tests with GPU compute for IB markers (not added levelset …

bc972ca

…yet)

Moved mdoel-specific code to the model file for cleanliness

a1769d0

STLs appear to be working on the GPU with NVHPC!

5e58655

STLs ran on GPU in 3D!

6cc7acc

danieljvickers marked this pull request as ready for review February 19, 2026 18:39

cubic-dev-ai bot reviewed Feb 19, 2026

View reviewed changes

coderabbitai bot reviewed Feb 19, 2026

View reviewed changes

Merge branch 'master' into gpu-optimizations

5fc31be

codeant-ai bot added the size:XL This PR changes 500-999 lines, ignoring generated files label Feb 20, 2026

codeant-ai bot reviewed Feb 20, 2026

View reviewed changes

coderabbitai bot reviewed Feb 20, 2026

View reviewed changes

	ray_dirs(i, k) = point(k) + f_model_random_number(rand_seed) - 0.5_wp
	ray_dirs(i, k) = f_model_random_number(rand_seed) - 0.5_wp

	call nvtxStartRange("PROPAGATE-IMERSED-BOUNDARIES")
	call nvtxStartRange("PROPAGATE-IMMERSED-BOUNDARIES")

		self.prohibit(ib and (num_ibs <= 0 or num_ibs > 1000),
		"num_ibs must be between 1 and num_patches_max (10)")

-        allocate (ma%trs_v(1:3, 1:3, 1:ma%ntrs))
+        if (allocated(ma%trs_v)) deallocate(ma%trs_v)
+        if (allocated(ma%trs_n)) deallocate(ma%trs_n)

	"num_ibs must be between 1 and num_patches_max (10)")
	"num_ibs must be between 1 and 1000")

-        dist_min = 1.e12_wp
+        ! Handle degenerate case with no triangles to avoid invalid indexing
+        if (ntrs <= 0) then
+            normals = 0._wp
+            distance = 0._wp
+            return
+        end if

	fraction = real(nInOrOut)/real(spc)
	fraction = real(nInOrOut, wp)/real(spc, wp)

Conversation

danieljvickers commented Feb 17, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Description

Type of change

Testing

Checklist

Summary by CodeRabbit

CodeAnt-AI Description

What Changed

Impact

Checking Your Pull Request

Talking to CodeAnt AI

Example

Preserve Org Learnings with CodeAnt

Example

Retrigger review

Check Your Repository Health

Uh oh!

coderabbitai bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

codecov bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cubic-dev-ai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 19, 2026

Choose a reason for hiding this comment

danieljvickers commented Feb 17, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 17, 2026 •

edited

Loading

codecov bot commented Feb 17, 2026 •

edited

Loading